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There  is  an  increasing  demand  for  next  generation  wireless  networks,  including  wireless 
local  area  networks  and  the  third  generation  cellular  networks,  that  can  provide  high  data  rate  for 
broadband  services,  improve  quality  of  service  (QoS),  and  support  more  users.  The  use  of  mul- 
tiple transmit  and  receive  antennas  can  offer  substantial  performance  improvement  to  a  wireless 
communication  system  by  making  the  use  of  the  extra  degrees  of  freedom  in  the  spatial  domain 
and  thus  is  a  promising  technique  to  satisfy  this  demand.  Many  of  the  current  space-time  coding 
schemes  proposed  for  multiple-antenna  systems  assume  perfect  timing  estimation  and  channel 
estimation  to  achieve  the  expected  performance  gain.  The  lack  of  timing  synchronization  be- 
tween the  transmit  and  receive  signals  and  the  inaccuracy  of  channel  estimation  could  degrade 
the  system  perfonnance. 

In  the  first  half  of  this  work,  we  investigate  the  problem  of  timing  estimation  in  multiple- 
anterma  systems  with  the  aid  of  training  signals.  A  slow,  independent  and  identically  distributed 
Rayleigh  flat-fading  channel  model  is  considered.  We  derive  two  maximum  likelihood  timing 
estimators  based  on  two  different  approaches,  namely  treating  the  channel  as  deterministic  and 
random,  and  present  the  corresponding  Cramer-Rao  bounds  (CRBs).  Then  the  optimal  designs 
of  training  signals  based  on  some  figures  of  merit  associated  with  the  CRBs  are  discussed. 
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In  the  second  half  of  this  work,  we  study  the  problem  of  the  estimation  of  correlated 
multiple-input  multiple-output  (MIMO)  channels  with  colored  interference.  The  Bayesian  chan- 
nel estimator  is  derived  and  the  optimal  training  sequences  are  designed  based  on  the  mean 
square  error  of  channel  estimation.  We  propose  an  algorithm  to  estimate  the  long-term  chan- 
nel statistics  in  the  construction  of  the  optimal  training  sequences.  We  also  design  an  efficient 
scheme  to  feed  back  the  required  information  to  the  transmitter  where  we  can  approximately 
construct  the  optimal  sequences.  Numerical  results  show  that  the  optimal  training  sequences 
provide  substantial  performance  gain  for  channel  estimation  when  compared  with  other  train- 
ing sequences. 


CHAPTER  1 
INTRODUCTION 

There  is  an  increasing  demand  for  next  generation  wireless  networks,  including  wireless 
local  area  networks  and  the  third  generation  cellular  networks,  that  can  provide  high  data  rate 
for  broadband  services,  improve  quality  of  service  (QoS),  and  support  more  users.  The  use  of 
multiple  antennas  at  both  the  transmitters  and  receivers  in  wireless  communication  systems  is 
a  significant  technical  breakthrough  which  can  offer  substantial  performance  improvement  to 
wireless  links  by  making  the  use  of  the  extra  degrees  of  freedom  in  the  spatial  domain  and  thus 
is  a  promising  technique  to  fulfill  these  requirements.  A  system  employing  multiple  transmit 
and  receive  antennas  is  often  called  a  multiple-input  multiple-output  (MIMO)  system.  Recently, 
the  MIMO  system  and  its  related  techniques  have  been  widely  considered  for  next  generation 
wireless  communication  systems  such  as  wireless  local  area  networks  (WLAN)  and  the  third 
generation  (3G)  cellular  networks.  With  multiple  antennas,  the  communication  performance  can 
be  improved  by  many  orders  of  magnitude  without  increasing  transmit  power  and  bandwidth. 
Only  more  hardware  complexity  is  needed.  This  additional  hardware  requirement  is  enabled  by 
the  increasing  computational  power  of  integrated  circuits. 

MIMO  systems  provide  various  benefits  that  include  spatial  multiplexing  gain  and  diver- 
sity gain.  The  information  capacity  of  wireless  communication  systems  increases  significantly 
by  employing  multiple  antennas.  It  has  been  analytically  proved  that  MIMO  systems  can  pro- 
vide a  linear  increase  in  capacity  [1,2]  which  is  proportional  to  the  minimum  of  the  number 
of  transmit  antennas  and  the  number  of  receive  antennas.  This  spatial  multiplexing  gain  can 
be  obtained  by  transmitting  independent  data  streams  from  different  transmit  antennas.  The 
increased  information  rate  is  achieved  without  the  requirement  of  increasing  the  transmit  power 
and  expanding  the  transmission  bandwidth. 


The  physical  characteristics  of  the  wireless  channel  present  a  fundamental  teclinical  chal- 
lenge for  reliable  communications.  Wireless  communication  channels  exhibit  significant  sig- 
nal variations  on  a  short  term  time  scale  which  is  known  as  fading.  One  way  to  mitigate  the 
degradation  effects  of  fading  is  to  employ  diversity  techniques  which  provide  the  receiver  with 
several  replicas  of  the  same  transmitted  signal  over  independent  fading  channels.  The  proba- 
bility that  all  the  received  signals  experience  deep  fades  simultaneously  reduces  considerably. 
Thus  diversity  techniques  increase  the  reliability  of  wireless  links  and  dramatically  improve  the 
communication  performance  over  fading  channels.  The  commonly  used  diversity  techniques 
include  time  diversity,  frequency  diversity  and  spatial  diversity.  Time  diversity  can  be  provided 
by  channel  coding  combined  with  interleaving  or  automatic  repeat  request  (ARQ)  schemes.  In 
frequency  diversity,  the  same  narrowband  signal  is  transmitted  over  over  different  frequency 
bands  to  provide  independent  fading  channels.  Spatial  diversity,  which  is  also  known  as  an- 
tenna diversity  obtained  by  the  use  of  multiple  antennas,  is  preferred  over  time  diversity  and 
frequency  diversity  since  it  does  not  need  to  increase  the  transmit  signal  power  and  bandwidth. 
If  the  fading  effects  between  different  pairs  of  transmit  and  receive  antennas  are  approximately 
independent  and  the  transmitted  signal  is  carefully  designed,  the  received  signals  can  be  com- 
bined at  the  receiver  such  that  the  fading  of  the  resultant  signal  is  greatly  reduced  compared  to 
a  single  antenna  communication  system  and  thus  wireless  link  improvement  is  provided. 

Space-time  coding  (STC)  is  one  key  technique  that  has  been  introduced  to  provide  en- 
hanced performance  for  wireless  communication  systems  employed  with  multiple  antennas. 
Space  time  codes  are  designed  to  use  the  extra  degrees  of  freedom  in  the  spatial  domain  pro- 
vided by  extra  antennas.  They  incorporate  the  temporal  and  spatial  correlations  into  signals 
from  different  transmit  antennas  to  achieve  transmit  diversity  and  provide  spatial  multiplexing 
gain.  The  main  classes  of  space  time  codes  include  the  Bell  labs  layered  space-time  architecture 
(BLAST),  space-time  trellis  codes  (STTC)  and  space-time  block  codes  (STBC). 

Tarokh  et  al.  [3]  proposed  space-time  trellis  codes  which  can  provide  full  diversity  gain  at 
the  receiver.  After  that,  many  efforts  have  been  made  to  improve  the  originally  designed  space- 
time  trellis  codes  [4,  5].  Since  space-time  trellis  codes  are  designed  based  on  trellis  codes,  they 
provide  additional  coding  gain.  But  the  Viterbi  algorithm  has  to  be  employed  for  the  optimal 


decoder  of  STTC,  and  thus  the  decoding  complexity  grows  exponentially  with  the  memory 
length  of  trellis  codes  and  the  number  of  antennas. 

To  reduce  the  decoding  complexity,  Alamouti  introduced  a  simple  space-time  block  coding 
scheme  for  a  two  transmit  antenna  system  which  can  provide  fiill  diversity  gain  without  sacrific- 
ing the  transmission  data  rate  [6].  The  scheme  was  extended  to  more  than  two  transmit  antennas 
based  on  the  theory  of  orthogonal  designs  [7,  8,  9].  Space-time  block  codes  can  be  decoded  us- 
ing much  simpler  linear  processing  at  the  receiver  compared  with  the  Viterbi  algorithm  required 
for  space-time  trellis  codes.  Although  space-time  block  codes  achieve  the  same  diversity  gain 
as  space-time  trellis  codes  for  the  same  number  of  transmit  antennas,  they  do  not  provide  any 
significant  coding  gain.  To  make  a  compromise  between  STBC  and  STTC,  the  schemes  of  con- 
catenating the  traditional  trellis  codes  with  space-time  block  codes  to  obtain  additional  coding 
gain  has  been  proposed  [10-14]. 

BLAST  [15,  16]  is  the  first  space-time  coding  scheme  proposed  for  MIMO  systems  which 
provides  spatial  multiplexing.  In  BLAST,  the  multiple  independent  data  streams  are  transmit- 
ted from  different  transmit  antennas,  and  are  extracted  by  using  the  interference  nulling  and 
interference  successive  cancelation  strategies  at  the  receiver.  This  decoding  scheme  operated 
in  spatial  domain  for  BLAST  is  similar  as  the  successive  interference  cancelation  proposed  for 
multiuser  detection  [17]  in  CDMA  systems.  Field  tests  showed  that  BLAST  provides  a  substan- 
tial increase  of  data  rates  for  wireless  communication  systems  operating  in  practical  channels 
[18]. 

1  ■  1     Timing  Estimation  for  Rayleigh  Flat-fading  MIMO  Channels 

To  achieve  the  performance  gain  promised  by  the  multiple  antenna  system,  parameter  es- 
timations including  channel  estimation,  timing  estimation  and  frequency  offset  estimation  are 
key  components  of  the  space-time  system  design.  Both  channel  estimation  and  frequency  offset 
estimation  for  MIMO  systems  have  been  extensively  studied  in  the  literature  [19,  20]. 

An  issue  that  has  not  been  sufficiently  explored  is  timing  synchronization  in  multiple- 
antenna  systems.  Inaccuracies  in  timing  synchronization  can  degrade  the  performance  of  such 
communication  systems  in  a  similar  way  as  the  MIMO  channel  estimation  and  frequency  offset 
estimation  error  do.  For  instance,  many  of  the  current  space-time  coding  schemes  proposed 


for  multiple-antenna  systems  assume  perfect  knowledge  of  timing  and  channel  gains  at  the  re- 
ceiver in  order  to  be  able  to  achieve  the  promised  diversity  gain  and  capacity  improvement. 
The  performance  of  these  systems  may  be  limited  by  the  accuracy  of  timing  estimation.  One 
objective  of  this  work  is  to  study  the  problem  of  timing  estimation  for  a  wireless  communica- 
tion system  employing  multiple  transmit  and  receive  antennas  in  a  Rayleigh  flat-fading  channel 
environment. 

1 .2     Channel  Estimation  for  Correlated  MIMO  Channels  with  Colored  Interference 

For  the  multiple  antenna  communication  system,  theoretical  analysis  [1,2,  15]  shows  that 
the  capacity  increases  linearly  with  the  number  of  antennas  under  the  assumption  that  channel 
gains  between  different  transmit  and  receive  antennas  are  identical  and  independent  distributed 
(i.i.d.).  The  i.i.d.  assumption  is  reasonable  for  sufficiently  rich  scattering  environments.  On  the 
other  hand,  it  is  also  important  to  analyze  the  capacities,  design  optimal  transmission  strategies, 
and  investigate  the  related  channel  parameter  estimation  problem  for  MIMO  systems  in  more 
realistic  situations  which  include  spatially  correlated  channels  and  colored  interference. 

In  the  more  realistic  channel  environment,  fading  correlation  exists  between  the  different 
transmit  antennas  and  receive  antennas.  It  was  shown  [2 1  ]  that  the  capacity  of  correlated  MIMO 
channels  still  grows  linearly  with  the  number  of  antennas  but  the  growth  rate  is  affected  by  the 
channel  correlations  and  smaller  than  that  in  independent  fading  channels.  Based  on  the  ca- 
pacity results  for  correlated  MIMO  channels,  optimal  transmission  strategies  [22-25]  have  been 
widely  mvestigated.  Jorswieck  et  al.  [24]  investigated  the  correlated  Rayleigh  flat  fading  MIMO 
systems  with  perfect  channel  state  infonnation  at  the  receiver  and  the  channel  covariance  infor- 
mation fed  back  to  the  transmitter.  It  was  shown  that  transmitting  signals  along  the  directions 
of  the  eigenvectors  of  the  transmit  correlation  matrix  is  the  optimal  transmission  strategy. 

The  capacity  of  MIMO  channels  has  also  been  investigated  for  wireless  communication 
systems  with  colored  interference.  The  scenario  arises  in  cellular  systems  where  the  users  in 
one  cell  suffer  from  the  co-channel  interference  from  the  users  in  other  cells  due  to  frequency 
reuse,  or  in  ad  hoc  networks  where  each  transmitter-receiver  pair  suffers  from  the  interference 
from  other  transmitter-receiver  pairs  operating  in  the  same  frequency  band.  In  Lozano  et  al. 
[26],  the  capacity  of  MIMO  systems  with  the  presence  of  spatially  colored  interference  was 


investigated.  It  was  shown  that  the  capacity  increases  with  the  interference  spatial  correlation 
and  the  lowest  capacity  is  achieved  when  the  interference  is  white.  In  Moustakas  et  al.  [27],  the 
authors  provided  analytical  expressions  for  the  statistics  of  the  mutual  information  for  spatially 
correlated  channels  with  the  presence  of  interference. 

Channel  estimation  is  necessary  for  coherent  detection  in  multiple  antenna  communication 
systems.  The  inaccuracy  of  channel  estimation  could  degrade  the  system  perfonnance  substan- 
tially. There  are  few  works  considering  the  channel  estimation  problem  for  MIMO  systems 
in  realistic  situations,  which  include  both  spatially  correlated  channels  and  interference.  So 
another  objective  of  this  work  is  to  investigate  the  problem  of  estimating  correlated  MIMO 
channels  with  colored  interference. 

1 .3     Organization  of  the  Dissertation 

The  dissertation  is  organized  in  the  following  manner.  The  timing  estimation  problem 
for  MIMO  systems  with  the  aid  of  training  signals  is  investigated  in  Chapter  2.  In  Chapter 
3,  we  study  the  problem  of  estimating  correlated  MIMO  channels  in  the  presence  of  colored 
interference.  Conclusions  are  drawn  in  Chapter  4.  The  notation  used  in  this  dissertation  is 
summarized  in  Table  1.1  for  clarity. 


Table  1.1:  Matrix  Notations 


A 

matrix  with  complex  entries 

a 

column  vector  with  complex  entries 

Real  (a) 

real  part  of  column  vector  a 

In 

77,  X  7)  identity  matrix 

0 

zero  matrix 

diag(xi,a-2,.  ..,x„) 

diagonal  matrix  with  xi,  X2, . . . ,  x„  as  the  diagonal  elements 

A^ 

transpose  of  A 

A' 

complex  conjugate  of  A 

A^ 

complex  conjugate  transpose  (Hennitian)  of  A 

Ai/2 

Hemiitian  square  root  of  A 

vec(A) 

vector  obtained  by  stacking  columns  of  A  on  top  of  each  other 

tr(A) 

trace  of  A 

det(A) 

determinant  of  A 

A^B 

Kronecker  product  of  A  and  B 

a  >  b 

inequality  elementwise 

A 

matrix  with  real  entries 

a 

column  vector  with  real  entries 

CM 

complex  Gaussian  distribution 

m 

the  first  derivative  of  V'(0  w.r.t.  t 

m 

the  second  derivative  of  ip{t)  w.r.t.  t 

CHAPTER  2 
TIMING  ESTIMATION  IN  MULTIPLE-ANTENNA  SYSTEMS  OVER  RAYLEIGH 

FLAT-FADING  CHANNELS 

2.1     Introduction 

In  this  chapter,  we  investigate  the  timing  estimation  problem  for  a  wireless  communication 
system  employing  multiple  transmit  and  receive  antennas  with  the  aid  of  training  signals. 

Previous  related  work  was  primarily  restricted  to  acquisition  in  spread  spectrum  systems 
with  multiple  receive  antennas  [28,  29].  In  Dlugos  et  al.  [28]  and  Win  et  al.  [29],  the  maximum 
likelihood  estimator  of  the  received  code  lag  was  obtained,  and  the  error  probability  for  the 
acquisition  system  was  derived.  A  deterministic  but  unknown  channel  was  considered  in  Dlugos 
et  al.  [28].  whereas  a  flat  Rayleigh  fading  channel  with  known  statistics  was  assumed  in  Win 
et  al.  [29].  An  optimal  estimator  for  code  acquisition  was  derived  in  Shamain  et  al.  [30]  for 
spatially  correlated  channels.  In  Zhang  et  al.  [31],  the  performance  of  code  acquisition  in  a 
DS-CDMA  system  employing  muhiple  transmit  antennas  was  analyzed.  Through  simulations, 
it  was  shown  that  the  presence  of  multiple  transmit  antennas  improved  the  code  acquisition 
performance,  relative  to  that  of  a  single-antenna  system. 

Issues  related  to  parameter  estimation  of  signals  received  by  an  array  of  antennas  have 
also  been  treated  in  the  radar  array  signal  processing  literature  [32,  33].  Time  delay  and  spatial 
signature  estimation  of  known  signals  received  by  an  array  of  antennas  was  investigated  in 
Swindlehurst  et  al.  [34].  ML  algorithms  and  the  Cramer- Rao  bound  for  time  delay  and  array 
calibration  estimation  were  developed,  and  some  computationally  efficient  approximations  of 
the  ML  algorithms  were  proposed.  In  Dogandzic  et  al.  [35],  ML  methods  were  developed  for 
space-time  fading  channel  estimation  with  an  antenna  array  in  spatially  correlated  noise.  The 
CRBs  for  the  unknown  directions  of  arrival,  time  delays,  and  Doppler  shifts  were  derived,  under 
a  structured  and  unstructured  array  response  model. 

In  the  present  work,  we  consider  a  wireless  communication  system  with  multiple  trans- 
mit and  receive  anteimas  in  a  slow,  independent  and  identically  distributed  (i.i.d.)   Rayleigh 
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flat-fading  environment.  The  goal  is  to  investigate  the  problem  of  timing  estimation  in  such 
a  system  with  the  aid  of  training  signals.  One  of  the  main  questions  that  we  try  to  answer  is 
to  find  the  optimal  training  signal  design.  We  investigate  the  timing  estimation  problem  under 
two  approaches.  In  the  first  approach,  the  channel  is  assumed  to  be  unknown  and  determinis- 
tic where  joint  estimation  of  the  channel  and  delay  is  carried  out.  We  derive  an  ML  estimator 
for  joint  channel  and  timing  estimation,  and  compute  the  associated  CRB.  Then  we  discuss 
the  optimal  training  signals  with  respect  to  two  perfonnance  measures  based  on  the  CRB:  the 
outage  probability  that  the  CRB  is  larger  than  a  threshold  and  the  average  CRB.  We  show  that 
the  optimal  training  scheme  is  one  wherein  orthogonal  training  signals  from  multiple  transmit 
antennas  are  used.  In  the  second  approach,  the  channel  is  assumed  to  be  unknown  but  random 
with  known  statistics.  We  use  the  likelihood  fianction  averaged  over  all  random  channel  real- 
izations to  obtain  the  ML  estimator  for  the  delay.  We  derive  the  associated  CRB  and  study  the 
optimal  training  scheme  in  terms  of  minimizing  the  CRB.  We  show  that  perfectly  correlated 
training  signals  employed  at  different  transmit  antennas  constitute  the  optimal  transmit  scheme, 
in  contrast  to  orthogonal  training  signals  in  the  first  approach. 

The  rest  of  this  chapter  is  organized  in  the  following  manner.  The  system  model  is  in- 
troduced in  Section  2.2.  In  Section  2.3,  we  consider  the  timing  estimation  problem  when  the 
channel  is  assumed  to  be  unknown  but  deterministic.  In  Section  2.4,  we  study  the  problem  of 
timing  estimation  with  the  assumption  that  the  channel  is  random  but  with  known  statistics.  In 
both  sections,  we  derive  the  ML  timing  estimators  and  compute  the  associated  CRBs.  Optimal 
training  signal  designs  are  discussed  based  on  the  corresponding  CRBs.  In  Section  2.5,  some 
discussions  comparing  these  two  timing  estimation  approaches  are  provided. 

2.2     System  Model 

We  consider  a  single-user  MIMO  system  with  nt  transmit  antermas  and  n^  receive  anten- 
nas. We  assume  a  quasi-static  (block  fading)  channel  where  the  channel  varies  slowly  enough 
to  be  considered  invariant  over  a  block.  However,  the  channel  changes  to  an  independent  value 
from  block  to  block.  By  using  the  unstructured  array  model  [33],  the  received  baseband  signals 


at  the  receive  antennas  are  given  in  vector  form  by 

r(i)  =  f^hfcs,(i-r)  +  n(i),  (2.1) 

where  hfc  =  [hi,hk2,  ••  • .  hknrV  with  h^^  denoting  the  channel  gain  from  the  lih  transmit 
antenna  to  the  jth  receive  antenna,  r(i)  is  the  n^  x  1  received  signal  vector  from  the  receive  an- 
tenna array  and  Sk{t)  is  the  transmitted  training  signal  from  the  A:th  transmit  antenna.  Define  the 
channel  vector  as  h  =  [hf ,  h|", . . . .  h^J^.  Also,  n(/)  is  a  complex,  circular-symmetric,  white 
Gaussian  noise  process  with  zero  mean  and  covariance  matrix  E[n(i)n(u)^]  =  cr^I„^(5(<  -  u). 
The  symbol  r  denotes  the  unknown,  deterministic  delay  to  be  estimated.  This  model  assumes 
that  the  delays  between  all  pairs  of  transmit  and  receive  antennas  are  the  same.  This  corresponds 
to  the  case  in  which  the  distance  between  the  transmit  and  receive  antenna  arrays  is  much  larger 
than  the  sizes  of  the  arrays. 

We  consider  the  Rayleigh  flat-fading  channel  model,  in  which  the  channel  coefiicients 
h,i  are  i.i.d.  complex,  circular-symmetric,  zero-mean  Gaussian  random  variables  with  the 
C^f{0.  p^)  distribution,  i.e., 

E[hfchf  ]  =  p%,^,E[h,hl]  =  0,  and  E[h.,hf  ]  =  E[h.h[]  =  0,  for  i  ^  .;. 

The  conditional  likelihood  function  of  r(<),  given  the  unknown  r  and  h,  can  be  written  as 

/       1     /■'"  "'  ^     \ 

p{r{t)\T,  h)  =  7r-"-a-2"'  exp  (  -  ^  /         r(0  -  Yl  ^^^-^(^  "  ^)      ^0  '  ^^"^^ 

where  we  have  assumed  that  the  training  signals  Sk{L),  for  A;  =  1, ... ,  n,,  have  finite  durations, 
and  the  observation  interval  T„  is  larger  than  the  sum  of  the  maximum  training  signal  duration 
and  the  maximum  possible  value  of  r.  Thus  the  whole  transmitted  training  signals  are  observed 
at  the  receiver. 
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We  can  simplify  the  exponent  of  the  likelihood  function  to  find  the  sufficient  statistics  for 
the  estimation  of  the  delay  r: 


dt 
"'    r    r^o 


hfc 


dt 


[\v{t)-f^h,S,{t-T) 
•^0        I  fc=l 

=      /      r"{t)T{t)dt-2RelJ2     /      T^"{t)sk{t  -  r)dt 

nt      nt  i-To 

+  EEh^h,/      ,s;(f-r),s,(/-r) 
.=1  j=i  -^0 

=    const -2Re|j^     /      v'\t)sk{t  -  T)d.t  hA 
.=1  ,  =  1  -^0 

where  the  term  const  represents  the  part  which  does  not  depend  on  the  delay  r  and  the  channel 
h.  Also,  the  last  equality  holds  due  to  the  assumption  that  To  is  larger  than  the  sum  of  the 
maximum  training  signal  duration  and  the  maximum  possible  delay. 

Denote  the  matched  filter  output  corresponding  to  the  A:th  transmit  signal  by 


r.(r) 


=  /      Y'{t)sk{t-T)dt,         k=  1,2 nt. 

Jo 


(2.3) 


Note  that  r(r)  =  [ri(r)^,  toirf rn,(r)^]^  provides  sufficient  statistics  for  estimating  t. 

With  this  notation,  we  then  have 

rTo 


MY. 


A  =  l 


T"{t)Sk{t-T)di 


h,     =2Re{r(T)^h}. 


(2.4) 


Denote  the  crosscorrelation  between  the  training  signals  from  the  ith  and  jth  transmit  antennas 

as 

r,,=  [''s:{t)s,{t)dt,  (2.5) 

Jo 

which  forms  the  (i,  j)th  element  of  the  correlation  matrix  T.  Let  C  =  T  (g)  I„^.  Then,  we  have 

(2.6) 


vvhf'h,  r  s:{t)s,{t)dt  =  h"ch. 

.=1   J=l  ^0 
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From  (2.4)  and  (2.6),  the  conditional  likeliiiood  function  of  r(r),  given  the  unknowns  r  and  h, 
can  be  written  as 


p(r(r )  I  r,h)  =  ;:-"'■  cr-^"^exp 


-i  (const  -  2Re{r(r)^h}  +  h^Ch") 


(2.7) 


Let  f  (r)  =  [Re(r(r))'^.  -Im(r(r))'^]'^',  h  =  [Re(h)^,  Im(h)^]^,  and 

(    Re(C)     -Im(C)    | 
C  =  i  By  using  the  isomorphism  between  real  and  complex  matrices 

1    Im(C)      Re(C)    j 
[36],  we  have  2Re{r(r)''^h}  =  2f  (T)^h  and  h"Ch  =  2h'''Ch.  In  terms  of  these  real  quantities, 

the  conditional  likelihood  function  of  ffr)  is  then 


p(f(r)lT.h)  =  7r-"'-a-""^exp 


(T^ 


const  -  2T{TJ'h  +  2h^Ch 


(2.8) 


2.3     Timing  Estimation  with  Unknown  Deterministic  Channel 
In  this  chapter,  we  will  treat  h  as  unknown  but  deterministic  in  the  estimation  process  and 

consider  the  joint  estimation  of  the  delay  r  and  the  channel  vector  h. 

2.3.1     ML  Estimator 

In  this  section,  we  develop  the  ML  estimator  for  the  joint  estimation  of  the  timing  r  and 

the  channel  vector  h.  The  joint  ML  estimate  of  r  and  h  maximizes  the  conditional  likelihood 

function  (2.8)  as  a  ftmction  of  r  and  h: 


maxp(f(r)|r,  h)  =  max{maxp(f(T)|T,  h)}. 

T,h  ^  h 

Alternatively,  we  can  maximize  the  log-likelihood  function  given  by 

L  =  const  +  ^(2f(r)^h  -  2h^Ch). 


(2.9) 


(2.10) 


As  suggested  in  (2.9),  we  first  maximize  the  log  likelihood  function  L  over  h.  Taking  the  first 
derivative  of  L  with  respect  to  (w.r.t.)  h  gives 

dL        1 


dh       (T- 


:{2f(r)  -  4Ch}. 
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By  letting  |4  =  0,  we  get  the  ML  estimate  of  the  channel  h  as 

•^  ah 


h„,  =  ^C-if(r),  (2.11) 

where  we  have  assumed  that  C,  i.e.   C,  is  nonsingular  to  obtain  a  unique  estimation  of  the 
channel .  Then  substituting  (2.11)  into  (2.10)  gives  the  ML  estimate  of  the  delay  r  in  the  form : 

T,„,  =argrnax{f(rfC-if(r)}.  (2.12) 

T 

To  implement  the  ML  estimator  in  general,  we  need  to  conduct  a  line  search  over  all  possible 
values  of  r  to  maximize  the  above  metric. 
2.3.2     Cramer- Rao  Bound 

The  Cramer-Rao  bound  gives  a  lower  bound  on  the  variance  of  any  unbiased  estimator 
[36,  37].  It  has  been  widely  used  to  lower  bound  the  mean  square  error  (MSE)  of  symbol  timing 
estimators  [38,  39].  It  is  well  known  [36,  37]  that  ML  estimators,  under  mild  regularity  condi- 
tions and  with  independent  and  identically  distributed  observations,  are  asymptotically  unbiased 
and  efficient.  It  can  be  easily  verified  that  the  elements  of  r(r)  given  in  (2.3)  corresponding  to 
different  receive  antennas  are  i.i.d.  observations.  Thus  for  a  particular  realization  of  the  channel 
h,  the  ML  estimator  is  asymptotically  efficient,  i.e.,  it  approaches  the  CRB  as  the  number  of 
receive  antennas  Ur  becomes  large.  Hence  the  CRB  is  a  suitable  performance  measure  for  the 
ML  estimator  of  the  delay  r.  We  will  also  verify  the  suitability  of  employing  the  CRB  as  a 
performance  metric  by  computer  simulation  examples. 

The  main  result  of  this  section  on  the  CRB  is  contained  in  the  following  theorem. 
Theorem  2.3.1  (Cramer-Rao  bound).  Suppose  that  the  first  and  second  derivatives  of  the 

training  signals  Sk{t),  for  k  =  1, rit,  exist  and  they  are  uniformly  continuous  on  [0.  To]. 

Together  with  the  standard  regularity  conditions  in  [36,  37],  the  Cramer-Rao  bound  for  the 
estimation  of  the  delay  rfor  a  given  realization  of  the  channel  h  is  given  by 

2  1 
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where E[^0^]=[E[^^^i   ,^[^^t-\   ,....^[-^72 


^{^^}  =  E  hi.  I  ".;(/,)«.(/.)'//.,  ^  =  l,2,...,n,  (2.14) 


Proo/  The  CRB  for  the  estimation  of  r  is  given  as 


(2.15) 


CRB(h)  =  (1-1)22, 


(2.16) 


where  I  is  the  Fisher  information  matrix  for  the  joint  estimation  of  the  channel  h  and  the  delay 
T  which  is  defined  as 


1  = 


In     I12 
I21     I22 


-E 

-E 

dhdT 

' 

-E 

dhdT 

T 

-E 

Since  ^  =  4C  and  ^  =  0,  we  have 

ah  on 


111  =  -E 


^ 

ah2 


-A 


(2.17) 


(2.18) 


Moreover, 

Let  V  =  E 
then  I12  = 


9r(T) 


=  E 


I12  =  -E 

ari(r)^     dr -i^T 


dr 


VdhdT 

T 
dT~ 


(T' 


5f(r) 


(h 


(?rn,(r) 
'         St 


tT 


[Re(v)^,  -Im(v)^ 


The  ith  block  of  v  can  be  computed  from 

Mil    =    _  fr-(0^'-"-^'rf> 
dr  Jo  (h 


•■21  ■ 


-  /"     X^hI.4(/,-r)  +  n-(/.) 


as,,(/  -  r) 


Dt 


dt 


(2.19) 


"'  /-To 

=     -Y.K  slit-r) 

k=i       -^o 


^^S^{t-T) 


dt 


/      n*(0 
Vo 


(9s,(i  -r) 


5r 


dt. 
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The  fact  that  the  noise  n(i)  is  zero-mean  gives 


dT 


=  -Eh;/     4{«-r)^54^rf< 


k=.\ 


dT 


=  -f2K  fsmmdt.  (2.20) 

fc=l        -^0 


Finally,  I22  =  -JrE 
from  the  fact  that 


h  =  -^Re  {e  [^'gff^j  h|.  Similarly,  I22  can  be  computed 

a^r.(r)]  ^Y^^*f^\*^^ty,^^t)dt.  (2.21) 


dr^ 


A  =  l 


Applying  the  standard  result  on  the  inverse  of  a  partitioned  matrix  to  (2.16)  and  (2.17) 


gives 


CRB-^h)  =  I22  -  l2iln  I 


12- 


(2.22) 


By  using  the  relationship  between  real  and  complex  matrices  [36],  we  get 


(72  ■      \a^ 


(2.23) 


Then  the  CRB  of  the  estimation  of  the  delay  r  is 


CRB(h)  = 


cr" 


Re<^E 


h^+E 


dTJT) 
dT 


-\T 


C-iE 


Mil 

dT 


(2.24) 


D 


We  note  that  the  CRB  varies  with  different  choices  of  training  signals.  By  carefully  choos- 
ing the  training  signals  to  minimize  a  suitable  measure  associated  with  the  CRB,  we  can  poten- 
tially improve  the  estimation  perfonnance. 
2.3.3     Optimal  Training  Scheme 

Communication  systems  often  employ  the  same  symbol  waveforms  for  both  training  and 
data  phases.  The  choice  of  the  symbol  waveform  is  mainly  decided  by  the  performance  required 
by  data  transmissions.  In  this  section,  we  shall  make  the  following  simplifying  but  practically 
reasonable  assumptions  on  the  training  signals: 
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Assumption  1 

Let  ak  =  [ak{0):  ■■■,  ak{N  -  1)]"^  be  the  training  sequence  assigned  to  the  kth  transmit 
antenna,  and  on  this  antenna  the  training  signal  wavefonn  is  of  the  fomi 

N-i 

Sk{t)  =  Y^akmit-tT,),  (2.25) 

where  N  is  the  number  of  training  symbols  and  ip{t)  is  the  symbol  waveform.  We  call  the 
N  X  rit  matrix  A  =  [ai.  a2 a„,]  as  the  training  sequence  matrix. 

Assumption  2 

The  symbol  waveform  xp{t)  is  time-limited  to  a  single  symbol  period  [0,  T,]  so  that  adjacent 
symbols  do  not  interfere  with  each  other.  In  addition,  il.'{t)  is  sufficiently  smooth  to  guarantee 
the  existence  of  uniformly  continuous  first  and  second  derivatives.  This  condition  is  satisfied 
for  most  symbol  waveforms  of  practical  interest.  Two  typical  examples  are  the  time-domain 
raised-cosine  pulse  and  the  half-sine  pulse. 

Assumption  3 

A^A  is  nonsingular,  and  hence  F  and  C  are  also  nonsingular.  We  note  that  this  implies 

that  N  >nt. 

Under  the  assumptions  stated  above,  the  CRB  for  the  timing  estimation  can  be  simplified 
to  the  expression  summarized  in  the  following  corollary. 

Corollary  2.3.1  (Cramer-Rao  bound).  Given  Assumptions  1-3,  the  CRB  for  the  estimation  of 
T  for  a  particular  realization  of  the  channel  h  reduces  to 

CRB{h)  =  -^TJul^jrT^-r^^  (2.26) 

where  0^  =  V^^Vc  +  M.  ^Pb  =  J  \i'{tW  dt,  A  =  J  r{t)i>{t)  dt,  andi^d  -  /  r{t}>P{t)  dt. 
Proof  With  the  three  assumptions  on  the  training  signals,  we  have 

=    afa,  fr{t)i>{t)dt  (2.27) 
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Then,  Eqn.  (2.14)  can  be  written  in  terms  of  the  training  sequences  as: 


EJ'^l-V.Ehlal''.- 


Thus 


Hence  I22 


F, 

-dh 

•(r) 

-iT 

h  = 

=  >Ik  ' 

[   dr' 

4  Re 

E 

T 
h 

J=l  fc=l 


■XAh"{A"A^lJh. 


Moreover,  (2.15)  can  also  be  simplified  in  terms  of  the  training  sequences  as: 

■    =  -i>d 


Dt 


=  -ii>d^K^"ai. 


Thus  E 


'>r(T) 
dr 


=  -ipdiA"A  (g)  I„J*h*.  Similarly,  we  have 

T,,=   f  s;{t.)s,{t)dt  =  ^Pi,a^a, 
and  C  =  tjJt,{A"A  (8)  I.„J.Hence,  (2.23)  can  be  written  as 

I2ilnli2    =    ^^,h^(A^A®I„J(t/;feA'^A®I„J-Vd(A^A®I„Jh 


<t2      06 


h"(A"A0l„Jh. 


Then  the  Cramer- Rao  bound  for  the  estimation  of  the  delay  r  is 


It     1-1 


CRB(h)    =    [I22  -  loiln  I12] 


(2.28) 


h  =  0,J]5^hrhI.a,^a,:  =  i/vh^(A'^A®I„Jh.  (2.29) 


(2.30) 


(2.31) 


(2.32) 


-  ^^,h^(A^A  ®  I„Jh  -  lJ^h"{A"A  ®  I„Jh 


1  -1 


aVft 


(2.33) 


2(Vv'/v  +  |V.v/nh"(A«A®I„Jh- 

By  using  some  standard  properties  of  the  Fourier  transform  similar  to  the  Parseval's  theo- 
rem, we  have  ^P,  =  ^  J^^  \<iJiu;)\^du,  </;,  =  -^  .C^  uj'\^H\'d^',  and 
•0^  =  J-  J^^  uj\'^{u)\'^du,  where  ^(cj)  is  the  Fourier  transform  ofipit).  Then  according  to  the 
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Cauchy-Schwarz  inequality,  we  have 


1    /■+°°   ,  ^ 


OJ 


P(iu; 


1 

2tt 


+  00 


u;2|*(a;)|2(icj 


■27r 


4-OC 


|^(w)|'^rfcj 


<    0. 


(2.34) 


Since  i>b  >  0,  we  have  -^  >  0  which  implies  that  the  expression  of  the  CRB  given  in  (2.33) 


is  nontrivial. 
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As  a  result,  the  dependence  of  the  CRB  on  the  training  signals  Sk{t),  for  k  =  1, . . . ,  nj, 
simplifies  into  that  on  the  training  sequence  matrix  A  and  the  symbol  waveform  0(0-  In  the 
following  two  subsections,  we  optimize  the  training  sequence  matrix  A  in  terms  of  two  perfor- 
mance measures,  namely  the  outage  probability  that  the  CRB  is  larger  than  a  threshold  and  the 
average  CRB  over  all  channel  realizations. 

Outage  probability 

In  this  subsection,  the  outage  probability  that  the  CRB  is  larger  than  the  threshold  e,  i.e. 
Pr(CRB(h)  >  t),  is  used  as  a  performance  measure  with  respect  to  which  the  training  signals 
from  different  transmit  antennas  are  optimized. 

Write  the  spectral  decomposition  of  A^A  as  A^^A  =  UAU",  where  U  is  a  unitary 

matrix  and  A  =  diag{Ai ,  A2 ,  A„, }  is  the  diagonal  matrix  containing  the  positive  eigenvalues 

of  A'^A.  The  design  of  the  optimal  training  scheme  can  now  be  formulated  as  the  following 
optimization  problem: 

min        Pr(CRB(h)  >  e) 


subject  to     tr{A^A}  =  E:Ii  A,  <  £. 


A,  >  0, 


I  =  l,...,nt, 


(2.35) 


where  ipbtr{A^A}  <  P  specifies  a  constraint  on  the  total  transmit  power. 
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First,  we  consider  a  simple  but  important  case:  2  transmit  antennas  and  1  receive  antenna. 
In  tiiis  case,  the  optimization  problem  (2.35)  can  be  simplified  as  follows.  Starting  from  Corol- 
lary 2.3.1,  we  have 

Pr(CRB(h)  >  f)  =  Pr  fh^A'^Ah  <  -^  )  •  (2-36) 

With  the  spectral  decomposition  of  A^A,  h^A^Ah  =  h'^UAU^h  =  h'^Ah'  =  Aij/i'iP  + 
A2|/?2l^,  where  h'  =  U^h.  Since  h  is  a  random  vector  with  i.i.d.  complex,  circular-symmetric, 
zero-mean  Gaussian  elements  and  U  is  a  unitary  matrix,  h'  is  also  a  complex  Gaussian  random 
vector  with  the  same  distribution  as  h.  We  note  that  |/i',|'^  has  the  exponential  distribution  with 

H\K?)  =  p'- 

Let  ^,  =  ^  and  c,  =  py^,  for  i  =  1,  2,  then 

where  Xi  and  X2  are  independent  random  variables  with  exponential  distribution  and  E{Xi)  = 
E{X2)  =  1.  The  total  power  constraint  Ylll\  '^'  -  ^  '^  equivalent  to  rj  -f-  C2  <  1.  Hence  the 
optimization  problem  can  be  rewritten  in  the  following  simple  form: 


nun 

C1,C2 


subject  to      t'l  4-C2  <  1,    and    C1.C2  >  0  (2.38) 

In  order  to  solve  the  above  optimization  problem,  we  employ  the  following  result  on  the 
Schur-convexity'  of  the  distribution  function  of  the  linear  combination  of  two  exponential  ran- 
dom variables  [41]. 

Lemma  2.3.1.  Let  Xi  and  A'2  be  independent  random  variables  with  exponential  distribution, 
and  E{Xi)  =  E{X2)  =  1.  Then  the  function 

F{ci.C2,x)  =  Pr(ciXi  +  C2A'2  <  x),  where  ci  +  C2  =  landci,C2  >  0, 


'  A  detailed  description  on  Schur-convexity  and  majorization  can  be  found  in  Marshall  et  al. 
[40]. 
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is  Schur  convex  on  (ci ,  C2)  ifx  <  I,  and  it  is  Schur  concave  on  (ci .  C2)  ifx  >  3/2. 

Using  the  above  lemma  and  considering  the  region  in  which  the  CRB  threshold  e  > 
-^■^,  the  optimization  cost  function  in  (2.38)  is  a  Schur  convex  function  on  (ci,  C2)  .  Thus 
minimization  of  the  cost  function  occurs  if  and  only  if  ci  =  co  =  1,  i.e.,  -^i  =  -^2  =  o^  [^^J- 


This  implies  that  the  optimal  A  is  such  that  A" A  =  i^^h-  The  optimal  training  scheme 


IS 


summarized  in  the  following  theorem. 

Theorem  2.3.2.  Suppose  that  the  CRB  threshold  c  >  -^;^,  the  training  sequence  matrix 
A  such  that  A'^A  =  j^l  minimizes  the  outage  probability  of  the  CRB  for  a  system  with  2 
transmit  antennas  and  1  receive  antenna.  That  is,  the  optimal  training  sequences  from  different 
transmit  antennas  are  orthogonal  to  each  other  and  have  equal  powers. 

We  shall  see  from  the  discussion  in  the  next  subsection  on  the  average  CRB  (Corollary 
2.3.2),  the  value  -^^  is  exactly  one  half  of  the  average  CRB  over  all  channel  realizations. 
Thus,  it  is  reasonable  to  consider  the  stated  region  of  the  CRB  threshold. 

It  seems  natural  that  a  result  analogous  to  the  one  in  Lemma  2.3.1  be  true  for  the  more 
general  case.  While  the  proof  of  such  a  result  remains  open,  there  is  strong  evidence  regarding 

the  Schurconvexity  of  the  function  F(ci,.  ..,c„,,x)  =  Pr(ciXiH hc„,Xn,  <  x)  where  X,, 

for  ;■  =  1 ,  Jit,  are  independent  random  variables  with  unit-mean  exponential  distribution. 

The  following  conjecture  has  been  advanced  in  Merkle  et  al.  [41],  supported  by  some  strong 
numerical  results. 

Conjecture  2.3.1.  The  family  of  unimodal  distribution  functions  F(ci, c„,,  x)  is  increasing 

with  respect  to  the  variance  (i.e.,  Schur-convex)  for  small  values  x,  and  decreasing  (i.e.,  Schur- 
concave)  for  large  values  ofx. 

Based  on  the  above  conjecture,  we  conjecture  that  the  result  in  Theorem  2.3.2  extends  to 
the  case  of  arbitrary  numbers  of  transmit  and  receive  antennas: 

Conjecture  2.3.2.  When  A"  A  =  ^7^  I,  the  outage  probability  of  the  CRB  is  minimized  if  the 
CRB  threshold  t  is  not  too  small.  Thus  the  optimal  training  sequences  from  different  transmit 
antennas .  in  terms  of  minimizing  the  outage  probability,  are  orthogonal  to  each  other  and  have 
equal  powers. 
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In  Hassibi  et  al.  [42],  the  authors  assumed  perfect  thiiing  estimation  and  studied  the  prob- 
lem of  choosing  the  optimal  training  sequences  for  channel  estimation  to  maximize  a  lower 
bound  on  the  capacity  of  the  channel  that  was  learned  by  training.  The  optimal  training  se- 
quences for  channel  estimation  turned  out  to  have  the  same  structure  as  those  we  get  here  for 
timing  estimation. 

To  illustrate  our  conjecture  on  the  optimality  of  orthogonal  sequences,  we  have  carried 
out  a  large  number  of  numerical  calculations.  In  the  broad  region  of  t  that  we  are  interested 
in,  we  have  not  observed  the  existence  of  any  other  schemes  which  can  achieve  a  lower  out- 
age probability  than  the  orthogonal  training  signals.  In  Fig.  2.1,  we  plot,  for  instance,  the 
outage  probabilities  Pr(CRB(h)  >  e)  for  a  system  with  4  transmit  antennas  and  a  single  re- 
ceive antenna  employing  different  training  signal  sets.  Note  that  since  P  is  the  total  transmit 
power  constraint,  the  signal-to-noise  ratio  (SNR)  ^  here  should  be  understood  as  the  total 
SNR  for  the  whole  training  period  instead  of  the  SNR  for  one  symbol  period.  The  time-domain 
raised-cosine  pulse  is  used  as  the  symbol  waveform.  The  results  in  the  figure  suggest  that  the 
orthogonal  training  signals  are  optimal  and  can  provide  a  significant  performance  gain  over  the 
other  training  signals. 

In  Fig.  2.2,  we  compare  the  outage  performance  of  orthogonal  training  sequences  for  dif- 
ferent numbers  of  transmit  antennas.  The  results  in  the  figure  show  that  the  use  of  multiple  trans- 
mit antermas  can  offer  substantial  estimation  perfonnance  improvement  over  a  single-antenna 
system.  For  example,  if  we  consider  the  outage  probability  Pr(CRB(h)  >  e)  =  0.1,  the  two- 
transmit  antenna  system  can  achieve  a  4  dB  performance  gain  and  the  four-transmit  antenna 
system  can  achieve  a  6  dB  performance  gain.  The  performance  gap  grows  with  decreasing 
outage  probability. 

More  precisely,  the  outage  probability  for  orthogonal  training  signals  is  given  by 


Pr(CRB(h)>e)     =    Pr  (h^h  <  -||^) 


(2.39) 


21 


where  the  second  equahty  is  obtained  from  the  fact  that  h^h  is  xLtn, "distributed  [43].  From 
(2.39),  it  is  not  hard  to  see  i 
is  approximately  given  by 


(2.39),  it  is  not  hard  to  see  that  when  the  SNR  is  large,  i.e.  ^  >  -  5^,  the  outage  probability 


Pr(CRB(h)  >  e)  ^ 


,9        9    n  ntiir 


(2.40) 


(;ntnr)\ 

Eqn.  (2.40)  indicates  that  the  outage  probability  decreases  with  the  (7^,,n,.)th  power  of  the  re- 
ciprocal of  the  SNR.  The  power  utUr  is  usually  referred  to  as  the  diversity  order  of  the  system 
[43].  Thus  we  conclude  that  the  use  of  multiple  transmit  and  receive  antennas  (with  orthogonal 
training  signals)  provides  spatial  diversity  for  timing  estimation  in  the  same  way  as  space-time 
coding  does  for  demodulation  [1,  15]. 

An  important  remaining  issue  is  whether  the  ML  estimator  can  achieve  the  outage  prob- 
ability of  the  CRJB.  For  each  realization  of  the  channel  h,  the  ML  estimator  is  asymptotically 
efficient  with  increasing  number  of  receive  antennas  77,..  We  note  that  Pr(CRB(h)  >  e)  = 
Eh[l(CRB(h)  >  e)],  where  1()  is  the  indicator  function.  Because  the  indicator  function  is  a 
bounded  function,  the  dominated  convergence  theorem  [44]  implies  that  the  ML  estimator  can 
achieve  the  outage  probability  of  the  CRB  asymptotically. 

To  verify  the  suitability  of  using  the  outage  probability  as  a  performance  metric  when  the 
number  of  receive  antennas  is  small,  we  evaluate  the  performance  of  the  ML  estimator  via 
Monte-Carlo  simulations.  In  Fig.  2.3,  we  plot  the  outage  probabilities  of  the  ML  estimator 
obtained  from  simulation  and  calculated  using  the  CRB,  respectively,  for  a  system  with  two 
transmit  antennas  and  employing  orthogonal  training  sequences.  It  can  be  seen  that  the  ML 
estimator  gives  an  outage  probability  performance  very  close  to  that  predicted  by  the  CRB  even 
for  small  values  of  ?(,  =  L  2,  and  4.  Hence,  the  simulation  results  verify  that  the  outage  prob- 
ability of  the  CRB  provides  an  effective  performance  metric  also  when  the  number  of  receive 
antermas  is  small. 

Average  CRB 

In  this  subsection,  we  use  the  CRB  averaged  over  the  Rayleigh  flat-fading  channel  h  as  an 
alternate  performance  measure  based  on  which  the  training  signals  from  the  transmit  antennas 
are  optimized. 
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-*-   E=10      C,=C2=C3=C, 

-e-  E=10'^,c,=2/3,C2=1/6,C3=c^=1/12 

-a-  E=10"^,C^=9/10,C2=C3=C^=1/30 
-V-  E=10"^c=99/100,c=c=c  =1/300 

_2     1  2      3      4 

-»-   E=10     ,C^=C2=C3=C^ 

-O-  E=10"^,c,=2/3,C2=1/6,C3=c^=1/12 

-O-   E=10"^,C^=9/10,C2=C3=C^=1/30 

_.^_  E=10'^,c^=99/100,Cj=C3=c^=1/300 


Figure  2.1:  Outage  probabilities  achieved  using  different  training  signal  sets  for  a  system  with 
4  transmit  and  1  receive  antermas.  The  unit  of  the  threshold  e  is  T^. 
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Figure  2.2:  Outage  probabilities  achieved  using  orthogonal  training  signals  for  different  num- 
bers of  transmit  antennas.   One  receive  antenna  is  employed.   The  unit  of  the  threshold  e  is 
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Figure  2.3:  Comparison  of  outage  probabilities  of  the  ML  estimator  obtained  from  simulation 
and  calculated  from  the  CRB.  The  number  of  transmit  antennas  rit  is  2  and  e  =  10~^T^. 
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After  averaging  over  the  Rayleigh  flat-fading  ciianne!  h,  the  average  CRB  is  given  as 


Eh[CRB(h)] 


2^, 


1 


h"(A«A®I„Jh 


(2.41) 


The  design  of  the  optimal  training  scheme  can  now  be  formulated  as  the  following  optimization 
problem: 


mm 

A 


h«(A«A®I„^)h 


subject  to     tr{A«A}  =  E:=i  A,  <  £, 


A,  >0, 


i  =  l,...,nt. 


(2.42) 


The  following  theorem  specifies  the  optimal  training  sequence  that  minimizes  the  average  CRB. 


Theorem  2.3.3.  When  A^A 


V"),"* 


I,  the  average  CRB  over  the  Rayleigh  flat-fading  channel 


h  is  minimized.  That  is,  the  optimal  training  sequences  from  different  transmit  antennas,  in 
terms  of  minimizing  the  average  CRB,  are  orthogonal  to  each  other  and  have  equal  powers. 

Proof  Let  W  =  U' A'TI'-^^,  where  A'  =  diagjAj ,  Aj, . . . ,  A^^„^ }  contains  the  positive  eigenval- 
ues of  the  Hermitian  matrix  W,  and  U'  is  a  unitary  matrix.  Consider  the  following  optimization 
problem: 


mm 


^  Lh«WhJ 

subject  to  tr{w}  =  E:=rA;<^, 

a'  >  0,         i  =  1,..  ..ntUr- 


(2.43) 


that 

E 

1 
h^Wh 

=  E 

1 
h'^U'A'U'^^, 

=  E 

1 


h'^A'h' 


=  E 


E"rA;i^:N 


(2.44) 


where  h'  =  U'-^h.  As  before,  h'  is  a  complex  Gaussian  random  vector  with  the  same  distribu- 
tion as  h . 

Letg(A')  =  =-V-,  where  Xj  >  0  are  assumed  to  be  fixed  constants.  We  study  the  convexity 
property  of  gi. 
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We  have  ^  = 


(E\^.) 


G(A' 


and 


a'f. 


ax[d\',       (EAi^.)3 


Then  the  Hessian  G(A  )  of  g  is 


2± 


23:13:2 

2x\_ 


(Ea!-^.)'        (E>!:^.) 


2.T1.7.nt 

(ea;x.)3 

2X2X„, 

(ea;x.)3 


V     =  '•        ■      ■     I 

It  is  easily  seen  that  every  rows  of  the  Hessian  G'(A')  are  dependent.  So  rank{C{X'))  =  1. 
G(X  )  only  has  one  nonzero  eigenvalue  which  is  .^  -  '  >  0.  (the  sum  of  eigenvalues  of  a 
matrix  is  equivalent  with  the  sum  of  all  diagonal  elements.)  All  other  eigenvalues  are  zero. 
Hence,  the  Hessian  G{X')  is  a  positive  semidefinite  matrix.  Then  ^(A')  is  a  convex  function  on 
W"r^  =  {(A;,  .  . . ,  A:„„J  :  a;  >  0,  for  z  =  1, .  . . ,  n,n,.}. 

In  order  to  solve  the  above  optimization  problem,  we  employ  the  following  result  from  the 
theory  of  majorization  [40].  We  first  introduce  some  fundamental  concepts  of  majorization  that 
we  require  in  the  derivation  of  the  optimal  transmit  scheme. 

For  any  x  =  (.ti,  . . .  ,:?;„)  G  R",  let  x^  >  >  .X[„]  denote  the  components  of  x  in 
decreasing  order. 

Definition  2.3.1.  For  vectors  x.  y  G  ^  C  M",  vector  y  majorizes  -x.  on  A  if 

k  k 


^x|,j  <  J]y[,],A:  =  l, 


,  n  -  1 


(=1 


i;=i 


(=1  i=l 

The  notation  x  ^  y  means  x  is  majorized  by  y  on  A,  or  y  majorizes  x  on  A. 

Majorization  makes  precise  the  vague  notion  that  the  components  of  a  vector  x  are  less 

spread  out  or  more  nearly  equal  than  the  components  of  vector  y. 

Definition  2.3.2.  A  real-valued  function  f  defined  on  a  set  A  C  K"  is  said  to  be  Schur-convex 
on  A  if 

X  ^  y  =^  /(x)  <  /(y) 

/  is  Schur-concave  if  the  above  inequality  is  reversed.  It  follows  that  /  is  Schur-convex  on 
A  if  and  only  if  —  /  is  Schur-concave  on  A. 
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Lemma  2.3.2.  IfXi ,...  .XnOre  exchangeable  random  variables  and  the  multi-variable,  single- 
valued  function  g  is  a  symmetric  Borel-measurable  convex  function,  then  the  function 

f(ai, an)  =  E[g{aiXi ,  a„X„)] 

is  Schur  convex. 

Since  /;.-  are  i.i.d.,  they  are  exchangeable  random  variables.  Since  h'l  are  exchangeable 
random  variables  and  g{X')  is  a  symmetric  Borel-measurable  convex  function,  the  flinction 
E       ntn/.^,,.2    is  Schur-convex  by  the  lemma. 

Moreover,  since  (^,- ■■,^)  is  majorized  by  (A^, ...,  A'„,„  J  whenever  A]  >  0,J2K  = 
^,  we  know  [40]  that  E    j^tj^'^j^,^2    is  minimized  with  X\  =  x'.^  =  ■  ■  ■  =  \[^^„^  =  -^.  We 

note  that  this  choice  of  A],  i  =  1 ,  ritUr,  also  satisfies  the  constraints  in  the  minimization 

problem  in  (2.42).  Thus,  it  is  also  a  solution  to  the  original  minimization  problem.  Thus  the 
optimal  training  sequence  matrix  A  should  satisfy  A"  A  =  ^I  which  implies  that  the  train- 
ing sequences  from  different  transmit  antennas  are  orthogonal  to  each  other  and  have  equal 
powers.  D 

With  the  optimal  training  sequences,  we  can  provide  an  explicit  expression  for  the  average 
CRB  which  is  described  in  the  next  corollary. 

Corollary  2.3.2  (Average  CRB).  Using  the  optimal  training  scheme,  the  average  CRB  over  the 
Rayleigh  flat-fading  channel  h  is  given  by 

E^[CRB{h)\  =  -     .      ^\.       ^  (2.45) 

when  Htiir  >  2. 

Proof  From  Theorem  2.3.3  and  its  derivation,  the  average  CRB  under  the  optimal  training 
scheme  is  given  as 

/t2.,/„     r  1  1 

(2.46) 


Eh[CRB(h)]  =  -^Eh 


■05nt   ^-^1=1     I     1 

where  h[  are  i.i.d.  complex  circular-symmetric  Gaussian  random  variables  with  the  CM{0,  p^ 
distribution. 
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Let  Y  =  Y^iL"'  \K\^-  Then  Y  is  \|„^„^ -distributed  with  the  probabihty  density  function 
(p.d.f.) 

/>-(y)  =  iJL^2n,n  oln  r( ^  2/'"'""''^'^ '  ^'^'^  V  ^  0-  (2.47) 

Let  Z  =  1/Y.  The  p.d.f.  of  Z  is  given  as 

1        /1\  e~^ 


The  expectation  of  Z  can  be  computed  as 

E(Z)    =     /     zfz{z)dz 
Jo 


1  '     e~^z"""-^dz.  (2.48) 


?•;  70 


(-^)2ntn,2n(.nrp(„^^ 

When  utTir  >  2  [45],  we  have 

F(7)  =  2'""-  (n,n.-2)!  ^  1  ^2  49) 

Then  from  (2.46)  and  (2.49),  the  average  CRB  can  be  written  in  a  simplified  way  as 

EH[CRB(h)]  =  -     .      '^\.       ^.  (2.50) 

D 

With  the  optimal  (orthogonal)  training  sequences,  the  average  CRB  is  a  simple  fimction  of 
the  constant  -^,  which  only  depends  on  the  symbol  waveform  ip{t),  the  signal-to-noise  ratio 


^,  the  number  of  transmit  antermas  n,,  and  the  number  of  receive  antennas  n,..  Note  that  the 
average  CRE  in  the  limit  of  large  n,,  or  large  rv  can  be  approximated  as 

Eh[CRB(h)]  ^  -^;^  =  -2n.(0,^J'+|0.n^'  ^^'^^^ 

which  is  inversely  proportional  to  the  number  of  receive  antennas  n^.  When  i/'(i)  is  symmetric 
about  ^,  such  as  the  time-domain  raised-cosine  pulse  and  the  half-sine  pulse,  -ipd  becomes  zero. 
Then  the  average  CRB  for  the  estimation  of  the  delay  r  with  orthogonal  training  signals  can  be 
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written  as 

Eh[CRB(h)]  =  \  ^,  (2.52) 


...11/2       r     r,,,..,w;,,w,ii/2 
where  p  = 


■<i>h 


Jil,'{t)ip{t)dt 


/_+^c^2|*(w)|2d^ 


ni/2 

is  Icnown  as  the  root-mean- 


square  bandwidth  [37]  of  the  symbol  waveform.  Here  ^(o;)  is  the  Fourier  transform  of  V'(0- 
We  note  that  the  average  CRB  can  be  decreased  by  increasing  the  bandwidth  of  the  symbol 
waveform. 

As  before,  we  would  like  to  know  whether  the  ML  estimator  can  achieve  the  average 
CRB.  Because  the  function  hW(A^A®in  )h  '^  "°^  ^  bounded  flinction,  thus  unlike  the  outage 
probability  of  the  CRB,  the  ML  estimator  may  not  achieve  the  average  CRB  asymptotically 
(see  further  discussion  in  Section  2.5.1).  However,  the  average  CRB  provides  a  lower  bound  for 
the  variance  of  any  unbiased  timing  estimator  averaged  over  the  channel  realizations. 

Again,  we  employ  Monte-Carlo  simulations  to  evaluate  the  performance  of  the  ML  esti- 
mator with  a  small  number  of  receive  antennas.  In  Fig.  2.4,  we  compare  the  mean  squared 
error  (MSE)  achieved  by  the  ML  estimator  and  the  average  CRB  given  by  (2.45)  for  a  system 
with  two  transmit  antennas  and  employing  orthogonal  training  sequences.  For  a  single  receive 
antenna  system,  the  performance  of  the  ML  estimator  deviates  significantly  from  the  average 
CRB.  This  is  due  to  the  events  in  which  all  the  channel  coefficients  are  very  small  simultane- 
ously causing  the  estimation  performance  to  be  very  poor.  The  large  estimation  errors  caused  by 
these  events  dominate  the  MSE  of  the  ML  estimator.  We  can  see  from  the  figure  that  the  effect 
of  these  events  diminishes  as  the  number  of  receive  anteimas  or  the  SNR  increases.  In  the  for- 
mer case,  the  error  dominating  events  become  rarer  as  the  number  of  receive  antennas  increases. 
In  the  latter  case,  the  estimation  errors,  and  hence  the  effect  of  the  error  dominating  events,  get 
smaller  as  SNR  increases.  For  a  reasonably  small  value  of  iir,  e.g.  4,  and  a  reasonably  high 
SNR,  e.g.  20  dB,  we  see  that  the  average  CRB  is  still  a  rather  appropriate  performance  metric. 
2.4     Timing  Estimation  with  Random  Channel 

Recently,  differential  space-time  coding  schemes  [46,  47,  48]  have  been  developed  where 
channel  estimates  are  not  required  at  the  receiver.  For  this  situation,  we  only  need  to  consider 
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Figure  2.4:  Comparison  of  the  MSE  of  the  ML  estimator  obtained  from  simulation  and  the 
average  CRB.  The  number  of  transmit  antennas  ut  is  2.  The  unit  in  the  vertical  axis  is  T^. 
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the  estimation  of  the  delay  r.  A  reasonable  model  to  represent  this  scenario  is  that  the  channel 
is  random  with  known  statistics. 
2.4.1     ML  Estimator 

Recall  that  the  conditional  likelihood  function  p(f  (T)|r,  h)  of  f  (r)  in  terms  of  real  vectors 
and  matrices  is  given  by  (2.8).  With  the  assumption  of  i.i.d.  Rayleigh  flat-fading  channels 
between  the  transmit  and  receive  antennas,  we  have  E[hh^]  =  p^lntnr  and  E[hh^]  =  ^hmnr- 
The  joint  probability  density  function  of  the  channel  vector  h  is  given  as 


P(h)  =  TrTTTTTJ;::^  ^xp  \  -  -h^  h  \ .  (2.53) 


We  can  average  p(f(T)|T,  h)  over  all  realizations  of  h  to  obtain  the  unconditional  likelihood 
function  as 

p(f(T)|r) 
=     y'p(f(r)|r,h)p(h)dh 

=    const  X  -^=^  exp  | ^f (r)^ f 2C  +  ^l)     v{t)],  (2.54) 

^det(2C  +  ^I)         l^'  V  P'  J  J 

where  we  have  used  the  integral  resuh  from  Cramer  [49,  11. 12.1  a].  The  natural  logarithm  of 
p(f  (r)|T)  is  the  log-likelihood  fixnction: 

ln[p(f  (r)|r)]  =  consl  +  ^v{rf  ^20  +  ^I^      f  (r).  (2.55) 

By  using  the  relationship  between  real  and  complex  matrices  [36],  the  log-likelihood  function 
can  be  written  in  terms  of  complex  quantities  as 

ln\p{T{T)\T)]  =  const  +  j-^T{Tf  (c  +  ^l)      r(r)*.  (2.56) 

Hence  the  ML  estimator  for  the  delay  r  is  given  by 

r„,/ =argmaxp(r(r)|r)  =  argmax<^r(T)'^f  C  + ^ij     r(r)*>.  (2.57) 


We  assume  that  4  is  known  to  the  receiver  for  the  implementation  of  the  ML  estimator.  We 
note  that  the  matrix  C  +  4l  is  always  invertible.  So  unlike  the  restriction  in  Section  2.3,  C  can 
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be  singular  which  implies  the  training  signals  from  different  transmit  antermas  can  be  correlated 

with  each  other. 

2.4.2     Cramer-Rao  Bound 

The  CRB  for  the  timing  estimation  based  on  the  random  channel  model  is  summarized  in 
the  following  theorem. 

Theorem  2.4,1  (Cramer-Rao  bound).  Suppose  that  the  first  and  second  derivatives  of  the 
training  signals  Sk{t)  exist  and  they  are  uniformly  continuous  on  [O.To].  Together  with  the 
standard  regularity  conditions  [36,  37],  the  Cramer-Rao  bound  for  the  estimation  of  the  delay 
T  over  the  i.i.d  Rayleighflatfading  channel  model  is  given  by 

CRB  =  -^ ,^\^,,  (2.58) 


where  D  =  F  +  ^I  and  the  [i,  j)th  element  ofG  is 

J  "  Sk{t)s*{t)dt\(J  "  sl{t)s,{t)dt 

+  \p'J2(^J^''sk{t)s*{t)dt^  (^J\l{t)s,{t)d?j,  (2.59) 

fori  J  =  1,2,  ....Hf 

Proof  To  derive  the  CRB,  we  start  from  the  log-likelihood  function  in  (2.56): 

h[p(r(r)|r)]  =  const  +  ^r(r)^  (c  +  ^l\     r(r)*. 
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The  second  derivative  of  the  log  Ukelihood  function  ln[7)(r(r)|r)]  w.r.t.  r  is 
dHn\p{v{T)\T)] 


dr^ 


2  driry 
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The  expectation  of  the  above  is 

■  d'ln\p{v{T)\T)] 


^„|(c  +  ^i^""r*M-*M 
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(2.60) 


Write  ^=[^,^,, 


TlT 


'^"'gj     I    ,  where  the  ith  block  can  be  computed  as 


dr 


A:=l 


To 


dst{t  -  t) 


„     »'<'-^'-      3r 


dt 


'\-(,fj^M.         (161) 

OT 


Then 


(9r,(r)*  dVj{T) 


dr  dr 


"«  l-lo 


Sh{t-T)—^^ -dt 


dr 


9s.it -r)^ 


+  /      n{t)^-^ -dt 


0 


dr 


dr 


T„ 


+  1     n"[t)'^^dt 


Recall  that  the  channel  gain  vector  h  is  assumed  to  have  i.i.d  complex  circular  symmetric  Gaus- 
sian elements,  i.e.,  E[hfchf  ]  =  pHn^,  E[hA.h[]  =  0,  and  E[h,hf  ]  =  E[h,hJ]  =  0, 2  ^  j.  Thus 
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we  have 


ar,(r)*arj(r) 
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As  a  result,  we  have  E 


(Jr(T)*  i)r{T)' 


dr  dr 


=  P  (8)  I„^,  where  the  (z,  j)th  element  of  P  is  given  by 


P^^=P'Il{l  "■5fcW^;(i)rf/)(^  °sl{t)s,{t)d.t'^+a'J^  "s;{t)sj{t)dt. 


Similarly,  we  also  have  E 
element  of  Q  is  given  by 


9t2 


r  r 


+  E 


r  r 


9x2 


=  Q  ®  I„^,  where  the  (j,j)th 


Q^^  =  p'Ef  r^.w^Tw^of  r^i 


{t)sjit)dtj  +a'^  I     S*{t)sj{t)dt 
+p'T.(^l^''skmit)dt^  (^  "sl{t)sj{t)dt^+a'J^  "sms,{t)dt, 


for  i.  j  =  1,  2, . . . .  n(. 

Let  D  =  r  +  4l,  then 


^,tr{(D  ®  I„J-HP  ®  InJ}  +  -^tr{(D  ®  I„J-^(Q  ®  I„J} 
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P  +  iQ)®I„, 


jtr|(D-i®I„J 

d-^J'p  +  ^q')   ®I„, 

=    >.xtr{D-(p  +  iQ)}, 


9 
^tr 


(2.62) 


where  the  second  equality  is  obtained  by  using  (A  (g)  B)~^  =  A"^  (g)  B"^  and  the  third  equality 
is  obtained  from  the  property  (A  0B)(C®D)  =  (AC)(8)(BD)  [50].  LetG  =  P  +  |Q.  Since 

rTo 


/      s*{t  —  T)sj{t  —  T)dt  —  Vij  =  const, 
Jo 
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differentiating  the  left  side  twice  w.r.t.  r  gives 

2  /  °  s*{t)sj{t)  dt+   f  "  s*{t)sj{t)  dt+   f     s*{t)s,{t)  dt  =  0. 
Jo  Jo  Jo 

Then  using  the  above  equality,  the  (i,  j)th  element  of  G  becomes 

+\p'  E  ( l"^"  ^^(^)^'*(^)  ^0  ( /  °  '^(^)'^^*)  ^{ 

+^^'E  [£  ^k{t)s:{t)d?j(^j\l{tys,{t)d?j  (2.63) 

for  i,  j  =  1.  2, . . . ,  rij.  As  a  result,  we  note  that  G  does  not  depend  on  the  noise  a~.  The  CRB 
of  the  timing  estimation  is  given  as 

poD n  64) 

pfa2lnip(r(T)|r)]-|  2n,,tr{D-iG}' 

D 

2.4.3     Optimal  Training  Scheme 

In  this  section,  we  impose  Assumptions  1  and  2  made  in  Section  2.3  on  the  form  of  the 
training  signals.  With  these  two  assumptions,  G',j  can  be  simplified  to 

Hence  we  have  G  =  (/^'„P^A" AA^ A.  Thus  the  CRB  for  the  timing  estimation  can  be  simpli- 
fied as  given  in  the  following  corollary. 

Corollary  2.4.1.  Given  Assumptions  1  and  2,  the  Cramer-Rao  bound  for  the  estimation  of  the 
delay  r  over  the  i.i.d  Rayleigh  fiat  fading  channel  model  reduces  to 

CRB  =  _-^i--^ L^ ^.  (2.65) 


2nri>aP^ 


t2- 


tr  {  ( ^hA^ A  +  gl )      A« AA^ A 


-1 
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Moreover,  in  terms  of  the  eigenvalues  Ai,  A2, . . . ,  A„,  of  A^A,  we  have 

tr|fv'bA"A  +  4ll    'a"AA"a|=V^^.  (2.66) 

Thus  the  minimization  of  the  CRB  is  equivalent  to  the  following  optimization  problem: 

max  Y2Zi 


subject  to    T.Zik<f^, 

A,  >0,  i  =  l,...,nt.  (2.67) 

It  can  be  easily  verified  that  the  cost  function  ^"j^  J '-^  is  a  convex  function  on  (Aj, . . . ,  A„J. 

P 

Then  the  following  theorem  specifies  the  optimal  training  sequences  [51]. 
Theorem  2.4.2.  The  CRB  is  minimized  by  choosing  the  training  sequence  matrix  A  such  that 
Ai  =  "P",  and  X2  =  •  •  •  =  A„,  =0.      That  is,  the  optimal  training  sequences  from  different 
transmit  antennas  are  perfectly  correlated. 

We  note  that  the  rank  of  the  optimal  training  sequence  matrix  A  is  1.  This  implies  that  we 
can  choose  an  arbitrary  subset  of  transmit  antennas  to  transmit  the  training  signals  as  long  as  the 
training  sequences  from  the  chosen  transmit  antennas  are  perfectly  correlated  with  each  other 
A  common  choice  is  to  use  the  same  training  sequence  and  evenly  assign  the  power  to  each 
transmit  antenna.  With  the  optimal  choice  of  training  sequences,  the  corresponding  minimum 
CRB  is  given  by: 

On  the  other  hand,  when  orthogonal  training  signals  are  employed,  i.e.,  Ai  =  ■  •  •  =  A„,  =  -^^, 
the  CRB  is  maximized  to  the  value 

Contrary  to  the  previous  case  of  joint  estimation  of  the  channel  and  delay  where  orthogonal 
training  sequences  are  optimal,  they  are  the  worst  in  terms  of  the  CRB  value  for  estimating  the 
delay  under  the  random  channel  model.  Fig.  2.5  compares  the  CRBs  of  the  system  with  the 
perfectly  correlated  training  sequences  and  that  with  the  orthogonal  training  sequences.  Note 
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that  the  CRB  of  the  system  with  the  perfectly  correlated  training  sequences  is  the  same  for  any 
number  of  transmit  antennas.  We  see  that  the  performance  gain  achieved  by  the  optimal  scheme 
is  obvious  when  the  SNR  is  low.  For  any  fixed  nj,  the  performance  gap  vanishes  as  the  SNR 
becomes  sufficiently  large. 

In  Fig.  2.6,  we  compare  the  MSB  achieved  by  the  ML  estimator  and  the  CRB  given  in 
(2.68)  for  a  system  with  two  transmit  antennas  and  employing  perfectly  correlated  training 
sequences.  The  perfect  correlation  is  obtained  by  using  the  same  training  sequence  and  evenly 
assign  the  power  to  each  transmit  antenna.  As  will  be  discussed  in  Section  V.B,  no  knowledge  of 
signal-to-noise  ratio  is  needed  to  implement  the  ML  delay  estimator  for  this  choice  of  perfectly 
correlated  training  sequences.  We  observe  from  the  figure  that  for  a  reasonably  small  value  of 
Tir,  e.g.  4,  and  a  reasonably  high  SNR,  e.g.  20  dB,  the  CRB  is  a  tight  lower  bound  on  the  MSB 
performance  of  the  ML  estimator.  This  together  with  the  asymptotic  achievability  of  the  CRB 
suggest  that  it  is  an  appropriate  performance  metric. 

2.5     Discussions  and  Conclusions 

In  the  previous  two  sections,  we  have  studied  the  problem  of  timing  estimation  in  multiple- 
antenna  systems  from  two  different  approaches.  In  Section  2.3,  the  channel  h  is  assumed  to 
be  unknown  but  deterministic  and  joint  ML  estimation  of  h  and  the  delay  r  is  performed.  In 
contrast,  in  Section  2.4,  we  assume  that  the  channel  is  random  but  with  known  statistics  and  use 
the  likelihood  fiinction  averaged  over  all  charmel  realizations  to  construct  the  ML  estimator  for 
the  delay  r.  These  two  approaches  lead  to  two  different  optimal  training  signal  designs.  For  the 
deterministic  channel  approach,  we  see  that  orthogonal  training  sequences  minimize  the  outage 
probability  as  well  as  the  average  CRB.  For  the  random  channel  approach,  perfectly  correlated 
training  sequences  minimizes  the  CRB.  Here  we  compare  these  two  approaches  in  terms  of  the 
resulting  ML  estimators,  CRBs,  and  suitability  of  the  outage  and  average  CRB  performance 
measures. 
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SNR(dB) 


Figure  2.5:  Comparison  of  CRBs  obtained  using  orthogonal  training  sequences  and  perfectly 
correlated  training  sequences  for  different  numbers  of  transmit  antennas.  Note  that  the  CRB 
of  the  system  with  the  perfectly  correlated  training  sequences  is  the  same  for  any  number  of 
transmit  antennas. 
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20 
SNR(dB) 


Figure  2.6:  Comparison  of  the  MSE  of  the  ML  estimator  obtained  from  simulation  and  the 
CRB.  The  number  of  transmit  antennas  n,  is  2.  The  unit  in  the  vertical  axis  is  Tf . 
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2.5.1  Orthogonal  Training  Signals 

When  orthogonal  training  signals  are  employed,  both  the  ML  estimators  of  the  delay  r 
under  the  deterministic  and  random  channel  approaches,  respectively,  reduce  to 

r„i,  =argrnax{r(r)'^r(r)*}.  (2.70) 

T 

Thus  the  equal  gain  combiner  for  the  received  signals  from  the  receive  antennas  is  the  ML 
estimator  for  both  approaches.  Under  the  deterministic  channel  approach,  the  average  CRB  has 
the  value 

EH[CRB(h)]  =  -     .      '^\.       ^.  (2.71) 

Under  the  random  channel  approach,  the  CRB  has  the  value 

As  discussed  before,  the  CRB  in  (2.72)  is  asymptotically  achievable  by  the  ML  estimator  when 
the  number  of  receive  antennas  goes  to  infinity.  In  addition,  the  limiting  ratio  between  (2.71) 
and  (2.72),  when  n^  approaches  infinity,  is  — ^  which  is  smaller  than  1.  This  implies  that 
the  average  CRB  in  (2.71)  is  not  achievable  by  the  ML  estimator  asymptotically  when  n,  ap- 
proaches infinity.  On  the  other  hand,  for  small  values  of  n^,  the  value  in  (2.71)  can  be  larger 
than  the  value  of  (2.72)  when  the  SNR  is  large  enough.  More  precisely,  this  happens  when 
^  >  ntinrUt  -  1).  Thus  in  this  case,  the  average  CRB  in  (2.71)  actually  gives  a  tighter  bound 
on  the  performance  of  the  ML  estimator.  The  simulation  results  in  Fig.  2.4  are  in  agreement 
with  this  observation. 

In  this  sense,  the  average  CRB  may  not  be  as  good  a  perfomiance  measure  as  the  outage 
probability  in  the  deterministic  channel  approach  since  the  latter  is  asymptotically  achievable, 
starting  at  very  small  values  of  n, ,  by  the  ML  estimator.  However,  for  small  values  of  n,  and  at 
high  SNR,  the  average  CRB  may  still  be  a  reasonable  performance  metric. 

2.5.2  Perfectly  Correlated  Training  Signals 

Under  the  random  channel  approach  employing  perfectly  correlated  training  signals,  we 
have  A^A  =  -^qq^  where  q  is  an  arbitrary  Ut  x  1  vector  with  q-^q  =  n,.   For  instance. 
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q  =  [1,1,...,!]''^  when  we  use  the  same  training  sequence  and  evenly  assign  the  power  to  each 
transmit  antenna.  By  using  the  matrix  inversion  formula,  the  ML  delay  estimator  for  this  choice 
of  perfectly  correlated  sequences  is  reduced  to  be  exactly  the  same  as  the  one  for  orthogonal 
training  sequences  given  in  (2.70).  We  note  that  the  knowledge  of  the  SNR  ^  is  not  needed 
to  implement  the  above  ML  estimator.  Comparing  the  results  in  Figs.  2.4  and  2.6,  the  MSB 
obtained  by  the  ML  estimator  with  the  perfectly  correlated  training  sequences  is  smaller  than 
that  obtained  by  the  ML  estimator  with  orthogonal  training  sequences  for  all  cases  considered  in 
the  simulation  studies.  This  observation  is  in  agreement  with  the  training  sequence  optimization 
result  based  on  the  CRB  that  the  perfectly  correlated  sequences  are  superior  than  the  orthogonal 
sequences  under  the  random  channel  approach. 

In  general,  the  SNR  information  is  needed  to  implement  the  ML  estimator.  We  also  note 
that  perfectly  correlated  training  signals  are  not  applicable  in  the  deterministic  channel  approach 
since  they  cannot  be  used  to  estimate  the  channel  vector  h. 
2.5.3     Deterministic  vs  Random  Channel  Approaches 

The  resuhs  and  discussions  in  the  previous  sections  provide  some  guidelines  of  whether 
to  use  the  deterministic  or  random  channel  approaches  in  estimating  the  timing  parameter.  If 
the  design  consideration  is  the  outage  probability,  i.e.,  neglecting  a  small  percentage  of  the 
worst-case  channel  realizations,  one  would  employ  the  detenninistic  channel  approach  with 
orthogonal  training  signals.  On  the  other  hand,  if  the  average  estimation  (over  all  channel 
realizations)  error  is  the  main  design  criterion,  one  would  employ  the  random  channel  approach 
with  perfectly  correlated  training  signals.  We  note  that  the  perfectly  correlated  training  signals 
cannot  be  used  for  channel  estimation.  Thus  they  may  be  more  suitable  for  space-time  coding 
schemes  that  do  not  require  the  channel  information.  In  addition,  the  advantage  of  the  perfectly 
correlated  training  signals  over  orthogonal  signals  vanishes  at  high  SNR  in  the  random  channel 
approach.  Thus  when  the  number  of  transmit  antermas  is  not  very  large  and  at  high  SNR,  one 
could  employ  orthogonal  training  signals  for  either  of  the  two  approaches. 


CHAPTER  3 
CHANNEL  ESTIMATION  FOR  CORRELATED  MIMO  CHANNELS  WITH  COLORED 

INTERFERENCE 

3.1     Introduction 

Many  multiple  antenna  communication  systems  are  designed  for  coherent  detection  that 
requires  channel  state  information  (CSI)  in  the  demodulation  process.  For  practical  wireless 
communication  systems,  it  is  common  that  the  channel  parameters  are  estimated  by  sending 
known  training  symbols  to  the  receiver.  The  performance  of  this  kind  of  training-based  chan- 
nel estimation  scheme  depends  on  the  design  of  training  signals  which  has  been  extensively 
investigated  in  the  literature. 

It  is  well  known  that  imperfect  knowledge  of  the  channel  has  a  detrimental  effect  on  the 

achievable  rate  it  can  sustain  [52].  Training  sequences  can  be  designed  based  on  infonnation 

theoretic  metrics  such  as  the  ergodic  capacity  and  outage  capacity  of  a  MIMO  channel  [42, 

53,  54].   The  mean  square  error  (MSE)  is  another  commonly  used  performance  measure  for 

channel  estimation.  Many  works  [55-65]  have  be  carried  out  to  investigate  the  training  sequence 

design  problem  based  on  MSE  for  MIMO  fading  channels.  In  Wong  et  al.  [61],  the  authors 

studied  the  problem  of  training  sequence  design  for  multiple-antenna  systems  over  flat  fading 

MIMO  charmels  in  the  presence  of  colored  interference.  The  MIMO  channels  are  assumed  to 

be  spatially  white,  i.e.,  there  is  no  correlation  among  the  transmit  and  receiver  antennas.  The 

optimal  training  sequences  were  designed  to  minimize  the  channel  estimation  MSE  under  a  total 

transmit  power  constraint.  The  optimal  training  sequence  design  result  implied  that  we  should 

intentionally  assign  transmit  power  to  the  subspace  with  less  interference.  A  practical  algorithm 

of  estimating  the  long-term  second-order  statistics  of  the  interference  correlation  matrix  and 

an  efficient  scheme  of  feeding  back  necessary  information  to  the  transmitter  for  constructing 

the  optimal  training  sequences  were  also  proposed.    In  Kotecha  et  al.    [62],  the  problem  of 

transmit  signal  design  was  investigated  for  the  estimation  of  spatial  correlated  MIMO  Rayleigh 

flat  fading  channels.  The  optimal  training  signal  was  designed  to  optimize  two  criteria:  the 

42 
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minimization  of  the  channel  estimation  MSE  and  the  maximization  of  the  conditional  mutual 
information  (CMl)  between  the  channel  and  the  received  signal.  The  authors  adopted  the  virtual 
channel  representation  model  [66]  for  MIMO  correlated  channels.  It  was  shown  that  the  optimal 
training  signal  should  be  transmitted  along  the  strong  transmit  eigen-directions  in  which  more 
scatters  are  present.  The  powers  transmitted  along  these  eigen-directions  are  determined  by 
the  water-filling  solutions  based  on  the  minimum  MSE  and  maximum  CMI  criteria.  In  Cai 
et  al.  [65],  the  space-time  spreading  scheme,  block  coding  scheme  and  channel  estimation 
for  correlated  fading  channels  in  the  presence  of  interference  have  been  studied.  The  authors 
focused  on  the  single  receive  antenna  case  and  extended  their  results  to  the  multiple  receive 
antennas  case  where  receive  antennas  were  assumed  to  be  uncorrelated.  Based  on  the  previous 
optimization  results  for  the  special  case  [63]  where  there  was  no  interference,  the  space-time 
beamforming  (STBF)  matrix  was  chosen  as  the  training  symbol  matrix  for  the  linear  MMSE 
channel  estimator.  Then  the  optimal  power  loading  scheme  was  designed  for  the  training  symbol 
matrices  in  this  particular  set. 

In  this  chapter,  we  investigate  the  problem  of  estimating  correlated  MIMO  channels  with 
colored  interference.  We  adopt  the  correlated  MIMO  channel  model  [21,  67]  which  expresses 
the  channel  matrix  as  a  product  of  the  receive  correlation  matrix,  a  white  matrix  with  identically 
and  independent  distributed  (i.i.d.)  entries,  and  the  transmit  correlation  matrix.  This  model  im- 
plies that  transmit  and  receiver  correlation  can  be  separated.  This  fact  has  been  verified  by  field 
measurements.  The  colored  interference  model  used  here  is  more  suitable  than  the  white  noise 
model  when  jamming  signals  and/or  co-channel  interference  are  present  in  the  wireless  com- 
munication system.  We  consider  an  interference  limited  wireless  communication  system,  and 
assume  that  the  thermal  noise  is  small  relative  to  interference  and  can  be  ignored.  Then  we  show 
that  the  covariance  matrix  of  the  interference  has  a  Kronecker  product  form  which  implies  that 
the  temporal  and  spatial  correlation  of  the  interference  are  separable.  The  channel  estimation 
MSE  is  used  as  a  performance  metric  for  the  design  of  training  sequences.  The  optimization 
problem  encountered  here  which  minimizes  the  channel  estimation  MSE  under  a  power  con- 
straint is  a  generalization  of  two  previous  optimization  problems  which  are  encountered  widely 
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in  the  signal  processing  area  [61,  63,  64,  68].  We  first  analyze  the  optimal  structure  of  the  solu- 
tion by  using  the  Lagrangian  method,  and  then  find  the  optimal  power  allocation  scheme  which 
has  the  water-filling  interpretation.  Finally  we  determine  the  optimal  ordering  for  the  related 
eigenvector  matrices.  In  Cai  et  al.  [65],  the  authors  encountered  the  essentially  same  optimiza- 
tion problem  but  with  the  different  form.  Based  on  the  the  previous  optimization  results  for  the 
special  case  [63],  the  authors  chose  to  optimize  the  training  sequence  matrix  in  a  particular  set 
of  matrices  which  have  the  same  solution  structure  and  eigenvector  ordering  as  our  solution. 
Here  we  rigorously  prove  that  this  particular  solution  structure  and  eigenvector  ordering  result 
are  optimal  for  arbitrary  matrices  with  the  power  constraint.  The  design  of  the  optimal  training 
sequences  has  a  clear  physical  interpretation  which  implies  that  we  should  assign  more  power  to 
the  transmission  direction  constructed  by  the  eigen-direction  with  larger  channel  gains  and  the 
interference  subspace  with  less  interference.  In  order  to  implement  the  channel  estimator  and 
construct  the  optimal  training  sequences,  we  propose  an  algorithm  to  estimate  long-term  chan- 
nel statistics  and  design  an  efficient  feedback  scheme  so  that  we  can  approximately  construct 
the  optimal  sequences  at  the  transmitter.  Numerical  results  show  that  with  the  optimal  training 
sequences,  the  channel  estimation  MSE  can  be  reduced  substantially  when  compared  with  the 
use  of  other  training  sequences. 

The  chapter  is  organized  in  the  following  manner.  The  system  model  and  linear  MMSE 
channel  estimator  that  we  consider  are  introduced  in  Section  3.2.  In  Section  3.3,  The  optimal 
training  sequence  is  designed  based  on  minimizing  the  total  channel  estimation  MSE.  In  Section 
3.4,  an  algorithm  for  the  estimation  of  long-term  characteristics  of  the  channel  is  proposed 
and  an  efficient  feedback  scheme  is  designed.  Numerical  results  are  provided  in  Section  3.5. 
Conclusion  is  drawn  in  Section  3.6. 

3.2     System  Model 

We  consider  a  single  user  link  with  multiple  interferers.  The  desired  user  has  Ut  transmit 
antennas  and  Ur  receive  antennas.  We  assume  that  there  are  M  interfering  signals  and  the  ith 
interferer  has  n,  transmit  antennas.  The  MIMO  channel  is  assumed  to  be  quasi-static  (block 
fading)  in  that  it  varies  slowly  enough  to  be  considered  invariant  over  a  block.  However,  the 
channel  changes  to  independent  values  from  block  to  block.  We  assume  that  the  users  employ 
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a  frame-based  transmission  protocol  which  comprises  training  and  payload  data.  The  received 
baseband  signals  at  the  receive  antennas  during  the  training  period  are  given  in  matrix  form  by 

M  M 

Y  =  HS^  +  5]  H.Sf  =  HS^  +  E  =  HS^  +  J]  E, .  (3.1) 

j=i 1=1 

E 

The  Ur  X  /).(  matrix  H  and  the  n^  x  n^  matrix  H,  are  the  channel  gain  matrices  from  the  transmitter 
and  the  ith  interferer  to  the  receiver,  respectively.  S  is  the  N  xut  training  symbol  matrix  known 
to  the  receiver  for  estimating  the  channel  gain  matrix  H  of  the  desired  user  during  the  training 
period.  A^  is  the  number  of  training  symbols  from  each  transmit  antenna  and  yv  is  usually 
much  larger  than  n,.  S,  is  the  ;V  x  77,  interference  symbol  matrix  from  the  zth  interferer.  We 
assume  that  the  elements  in  S,  are  identically  distributed  zero-mean  complex  random  variables, 
correlated  across  both  space  and  time.  The  interference  processes  are  assumed  to  be  wide-sense 
stationary  in  time.  We  consider  an  interference  limited  wireless  communication  system.  Hence 
we  assume  that  the  thennal  noise  is  small  relative  to  interference  and  can  be  ignored  [69]. 

We  adopt  the  correlated  MIMO  channel  model  [21,  67]  which  models  the  channel  gain 
matrix  H  as: 

H  =  ay^H^R;/'  (3.2) 

where  R,  models  the  correlation  between  the  transmit  antennas  and  R^  models  the  correlation 
between  the  receive  antennas,  respectively.  The  notation  {■Y''^  stands  for  the  Hermitian  square 
root  of  a  matrix.  Hu,.  is  a  matrix  whose  elements  are  independent  and  identical  distributed  zero- 
mean  circular-symmetric  complex  Gaussian  random  variables  with  unit  variance.  Let  h^i  = 
t;ec(H^),  where  t'ec(X)  is  the  vector  obtained  by  stacking  the  columns  of  X  on  top  of  each 
other,  then  we  have 

h  =  vec{Yi)  =  (R^'  0  Ry')h.„,  (3.3) 

with  h  ~  CAf{0,  Ri  0  R, )  where  CM  denotes  complex  Gaussian  distribution.  Similarly,  we 
model  the  channel  gain  matrix  from  the  ixh  interferer  to  the  receiver  as: 

H,  =  Ry^H^.Rf  (3.4) 
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and 


Using  the  vec  operator,  we  can  write  the  received  signal  in  vector  form  as 


=    vec{Y) 

=    (S  ^  I„Jt)ec(H)  +  J^(S,  »  I„Jyec(H,) 


A/ 


1=1 


M 


=    {S®  I„Jh  +  5](S,  ®  I„J(Rf  ®  Ry')h„ 


=    (S®I„Jh  +  ^e. 


7=1 

=    (S®I„Jh  +  e 

where  I„^  denotes  the  n,.  x  n,.  identity  matrix. 

To  derive  the  linear  MMSE  channel  estimator,  we  need  the  following  lemma. 
Lemma  3.2.1.  E{e)  =  0  and  the  covariance  matrix  ofe  is 

M 


E{ee")  =  Y^  Qyv,  ^  R,-  =  Q/v  0  R,- 


i=i 


where 


Qni  = 


UURIM      ...  E;=i^b(^-i) 


(3.5) 


(3.6) 


(3.7) 


(3.8) 


and  R\  ^.(t)  represents  the  time  correlation  between  the  signals  at  time  instants  m  and  m  +  t 
from  the  kth  antenna  of  the  ith  interferer. 

Proof.  Since  lv„  ~  CJ\f{0,  I„,„J,  E(ei)  =  0.  Then  we  have  E(e)  =  0. 
The  received  signal  from  the  zth  interferer  can  be  written  as 


E,  =  H,Si  =  R^'  H,„,  Rjj  S,  =  R^'  H,„,Si. 

Si 

Since  Si  is  wide-sense  stationary  in  time,  S,  is  also  wide-sense  stationary  in  time. 


(3.9) 
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Using  the  vec  operator,  we  can  rewrite  the  interfering  signal  from  the  ith  interferer  as 


ei  =  vec{E^)  =  (lyv  ®  Ry^)veciH^,A)- 
The  covariance  matrix  of  e^  is  given  as 

E(e,ef )    -    E[(I^-  ®  Ry')vecin^.A)vec{Il„Af  {In  ®  R^)''] 
=    (I;v  ®  Ry')E[vec{H^.A)vec{Il^A)"]{lN  ^  R^). 

Let  e'-  =  vecCHyj^S,),  we  can  show  that  the  covariance  matrix  of  e',  is 


Efe.e 


'H^ 


Eti^l,.(o)i,. 
TZi  K,kiN  - 1)1. 

Q/Vi  ®In., 


E:Ll^M:(^-l)Ir 


E"li  «l.(o)i. 


(3.10) 


(3.11) 


(3.12) 


where  Rl  ^.  (r)  represents  the  time  correlation  between  the  signals  at  time  instants  m  and  m  +  r 
from  the  A;th  antenna  of  the  zth  interferer.  Then  we  have 


E[e,ef]    =    (Iyv®Ry')(Qiv,®I„J(lN®Ry') 

=      Qni  <8  Rr- 


(3.13) 


The  covariance  matrix  of  e  is  then  given  as 

M 


E[ee"]  =  ^  Qtv,;  O  Rr-  =  Q  «)  Rr- 


(3.14) 
D 


We  note  that  Q/v  captures  the  temporal  correlation  of  the  interference  and  R^  represents 
the  spatial  correlation.  The  covariance  matrix  of  the  interference  has  a  Kronecker  product  form 
which  implies  that  the  temporal  and  spatial  correlation  of  the  interference  are  separable. 
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We  notice  that  (3.6)  represents  a  linear  model.    Based  on  the  Bayesian  Gauss-Markov 
Theorem  [36],  the  linear  minimum  mean  square  error  estimator  (LMMSE)  for  h  is  given  as: 

=   [(S^Q^^^s  +  r;')-'  X  R.](S^  ®  K){Qn  s  R;')y 

=    [{S"Q-^'S  +  R;')-'S"Q-^'  ®l„,^]y.  (3.15) 

Using  the  equality  ?'rc(  AYB)  =  (B^  (g)  A)ve.c{Y),  we  can  rewrite  the  channel  estimator  in  the 
compact  matrix  form  as 

H  =  YQ5^^S(S'^Q;^lS  +  R^')-^  (3.16) 

Hence  the  charmel  estimator  does  not  depend  on  the  receive  channel  correlation  matrix  Rr. 

The  performance  of  the  channel  estimator  is  measured  by  the  estimation  error  e  =  h  -  h 
whose  mean  is  zero  and  whose  covariance  matrix  is 

C,    =    E[(h-h)(h-h)"] 

=   [(S^Q^iS)®R;^  +  Rr'®R;i]-' 
=  [(s"Q^^s  +  Rr^)xR;^]-^ 

=    (S^Q;^iS  +  R-^)-^sRr  (3.17) 

where  the  third  equality  is  due  to  (A  gB)(C0D)  =  AC®BD  and  (A®B)-i  =  A'^^B-^ 
The  diagonal  elements  of  the  error  covariance  matrix  C^  yields  the  minimum  Bayesian  MSE. 
The  total  MSE  is  the  commonly  used  performance  measure  for  MIMO  channel  estimation.  By 
using  the  fact  that  tr(  A  ®  B)  =  trAtrB,  we  have 

tr(C,)  =  tr((S"Q;^^S  +  R;'y'  ^  R,)  =  tr((S"Q;v'S  +  Rr')-')tr(R,). 

Thus  the  minimization  of  the  total  MSE  over  training  sequences  does  not  depend  on  the  receive 
channel  correlation  matrix.  Only  the  temporal  interference  correlation  matrix  Q,v  and  the  trans- 
mit correlation  matrix  R^  need  to  be  considered  in  obtaining  the  optimal  training  sequences. 
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3.3     Optimal  Training  Sequence  Design 
In  this  section,  we  investigate  the  problem  of  optimal  training  sequence  design  for  channel 
estimation.    With  the  total  Bayesian  MSE  as  the  perfonnance  measure,  the  optimization  of 
training  sequences  can  be  formulated  as  follows 

min       tr(S^Q;^^S  +  Rr')-'  (3.18) 

subject  to  tr{S^S}  <  F 

where  trfS'^S}  <  P  specifies  the  power  constraint. 

The  optimization  problem  itself  is  of  great  interest  to  researchers  in  the  signal  processing 
and  communication  areas.  Its  special  cases  (with  either  Qat  or  R(  equal  to  the  identity  matrix) 
have  been  encountered  widely  in  joint  linear  transmitter-receiver  design  [63,  68,  70],  training 
sequence  design  for  channel  estimation  in  multiple  antenna  communication  systems  [61,  64], 
and  spreading  sequence  optimization  for  code  division  multiple  access  (CDMA)  communication 
systems  [71]. 

The  solution  in  the  special  case  Rj  =  I,  found  for  example  in  Wong  et  al.  [61]  and 
Scaglione  et  al.  [68],  can  be  expressed  in  terms  of  the  eigenvalues  and  eigenvectors  of  Qat  and 
a  Lagrange  multiplier  associated  with  the  power  constraint.  Similarly,  the  solution  in  the  special 
case  Q^  =  I,  found  for  example  in  Zhou  et  al.  [63]  and  Biguesh  et  al.  [64],  can  be  expressed  in 
terms  of  the  eigenvalues  and  eigenvectors  of  R,  and  a  Lagrange  multiplier  associated  with  the 
power  constraint.  The  optimization  of  the  generalized  mean  square  error  problem  introduced 
here  is  more  difficult.  We  will  show  that  (3.18)  has  a  solution  that  can  be  expressed  S  =  USV'^ 
where  U  and  V  are  orthonormal  matrices  of  eigenvectors  for  Q,v  and  R(  respectively,  and  S 
is  diagonal.  Solving  (3.18)  involves  computing  diagonalizations  of  Q^/  and  Rt,  and  finding  an 
ordering  for  the  columns  of  U  and  V.  In  Cai  et  al.  [65],  the  authors  encountered  the  essentially 
same  optimization  problem  but  with  the  different  form.  Based  on  the  the  previous  optimization 
results  for  the  special  case  [63],  the  authors  chose  to  optimize  the  training  sequence  matrix  in 
a  particular  set  of  matrices  which  have  the  same  solution  structure  and  eigenvector  ordering  as 
our  solution.  Here  we  rigorously  prove  that  this  particular  solution  structure  and  eigenvector 
ordering  result  are  optimal  for  arbitrary  matrices  with  the  power  constraint. 
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A  related  optimization  problem  which  minimizes  the  trace  of  the  mean  square  error  matrix 
in  a  variant  form  is  discussed  in  Section  3.7.1.,  and  another  optimization  problem  which  max- 
imizes the  determinant  of  the  inverse  of  the  mean  square  error  matrix  is  introduced  in  Section 
3.7.2. 

We  solve  the  optimization  problem  (3.18)  in  three  steps.   First,  we  analyze  the  optimal 
structure  of  the  solution  by  using  the  Lagrangian  method,  then  find  the  optimal  power  allocation 
scheme,  and  finally  determine  the  optimal  ordering  for  the  related  eigenvector  matrices. 
3.3.1     Solution  Structure 

We  begin  by  analyzing  the  structure  of  an  optimal  solution  to  (3.18).  Let  UAU^  and 
VA V^  be  diagonalizations  of  Qat  and  R,  where  the  columns  of  U  and  V  are  orthonormal 
eigenvectors.  Let  Xj,l  <  j  <  N,  and  5i,  1  <  i  <  uu  denote  the  diagonal  elements  of  A  and 
A,  respectively.  We  assume  that  the  eigenvalues  {A,}  are  arranged  in  increasing  order,  and  {J,} 
are  arranged  in  decreasing  order: 

0<  Ai  <  A2  <  ...  <  Aa,     and     ()i  >  (^2  >  ■  ■•  >  (^"n,  >  0.  (3.19) 

Let  us  define 

T  =  U^SV.  (3.20) 

Substituting  S  =  UTV^  in  (3.18)  gives  the  following  equivalent  optimization  problem: 

min  tr(T^'A-^T  +  A-i)-i  (321) 

subject  to  tr  (T^T)  <  P.     T  e  C^'^"'. 

We  now  show  that  the  solution  to  (3.21)  has  at  most  one  nonzero  in  each  row  and  column. 
Theorem  3.3.1.  There  exists  a  solution  o/(3.21)  of  the  form  T  =  niSn2  where  Hi  andU2 
are  permutation  matrices  and  atj  =  Qfor  all  i  7^  j. 

Proof  We  first  prove  the  theorem  under  the  following  nondegeneracy  assumption: 

5,  ^  5.,  >  0  and  A,  ^  \,  >  0  for  all  i  j^  j.  (3.22) 
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Since  the  cost  fiinction  of  (3.21)  is  a  continuous  function  of  A  and  A,  and  since  any  A  >  0  and 
(5  >  0  can  be  approximated  arbitrarily  closely  by  vectors  S  and  A  satisfying  the  nondegeneracy 
conditions  (3.22),  we  conclude  that  the  theorem  holds  for  arbitrary  A  >  0  and  S  >  0. 

There  exists  an  optimal  solution  of  (3.21)  since  the  feasible  set  is  compact  and  the  cost 
fiinction  is  a  continuous  function  of  T.  Since  the  eigenvalues  of  A2T"A^^TA2  are  nonneg- 
ative,  it  follows  that  for  any  choice  of  T, 

tr(T^A-^T  +  A-i)-i  =  trA(A2T^A-iTA2 +I)-i  <  tr  (A), 

with  equality  when  T  =  0.  Hence,  there  exists  a  nonzero  optimal  solution  of  (3.21),  which  is 
denoted  T.  According  to  the  Lagrange  multiplier  theorem,  the  first-order  necessary  condition 
for  an  optimal  solution  is  the  following:  there  exists  a  scalar  7  >  0  such  that: 

-i  tr  ((T^A-^T  +  A-')-'  +  "fT^"T^)T-T  =  «■  (^.23) 

dT 

For  notation  simplicity,  let 

M  =  T^A-if  +  A-^  (3.24) 

For  any  invertible  matrix  M,  the  derivative  of  the  inverse  of  a  matrix  [72]  is  given  as: 


r-l 


(IT  V  <iT 

Hence,  (3.23)  is  equivalent  to: 

tr  {j[T"5T  +  6T"T]  -  M-^if^A-^dT  +  ST" A-^T]M-^)  =  0 

for  all  matrices  ()Tee^'^"'. 

Let  Real  (z)  denote  the  real  part  of  z  G  C.    Based  on  the  fact  that  tr  (A  +  A^) 
2(Real  [tr  (A)])  and  tr  (AB)  =  tr  (BA),  we  have 

Real  [tr  {-yT^ST  -  M-'T^A-^^T)]  =  0. 
By  taking  6T  either  pure  real  or  pure  imaginary,  we  deduce  that 

tr  ([7T^  -  M-2T^A->"T)  =  0 
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for  all  ST.  By  choosing  5T  to  be  completely  zero  except  for  a  single  nonzero  entry,  we  conclude 
that 

7T"  -  M-2t"A-i  =  0.  (3.25) 

lfj  =  0,  then  T  =  0  since  both  A  and  A  are  invertible.  Hence,  7  >  0. 
We  multiply  (3.25)  on  the  right  by  T  to  obtain 

7f  ^T  =  M-2f  ^A-^T  =  (T^A-if  +  A-1)-2T^A-iT  (3.26) 

Since  T^T  is  Hermitian,  we  have 

(T^A-^T  +  A-^)-2T^A-^T  =  t^A-^T(f  ^A-^T  +  A'^^. 

Then  we  will  show  that  T^A-^t  and  A"^  commute  with  each  other.  We  need  the  following 
lemma  [73]: 

Lemma  3.3.1.  If  A  and  B  are  diagonalizable,  they  share  the  same  eigenvector  matrix  if  and 
only  if  AB  =  BA. 

For  the  simplicity  of  notations,  let  A  =  T"A-^T  and  B  =  A  ^  Then  we  have 

(A  +  B)-2A  =  A(A  +  B)-2 

According  to  Lemma  3.3.1,  A  and  (A  +  B)-^  share  the  same  eigenvector  matrix.  Since  A  +  B 
and  (A  +  B)"2  have  the  same  eigenvector  matrix,  A  and  A  +  B  share  the  same  eigenvector 

matrix.  Then  we  have 

A(A  +  B)  =  (A  +  B)A 

Hence, 

AB  =  BA, 

which  implies  that  T^A'^T  and  A"^  commute  with  each  other.    Since  A"^  is  diagonal, 
T"A-^T  is  diagonal.  Since  T^A-^T  is  diagonal,  T^T  is  diagonal  by  (3.26). 

Since  T^A-^T  and  A"^  are  diagonal,  both  M  and  M"^  are  diagonal.  Hence,  the  factor 
M"^  in  (3.25)  is  diagonal  with  real  diagonal  elements  denoted  e,,  1  <  j  <  n^.  By  (3.25),  we 


have 
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lU,  =  ^.  (3.27) 


j    ij 


IfUj  ^  0,  then  (3.27)  impHes  that 


By  the  nondegeneracy  condition  (3.22),  no  two  diagonal  elements  of  A  are  equal.  If  for  any 
fixed  j,  Uj  ^  0  for  i  =  ii  and  12,  then  the  identity  ^  =  7  yields  a  contradiction  since  7  /  0 
and  A„  7^  A,^.  Hence,  each  column  of  T  has  at  most  one  nonzero.  Since  T^T  is  diagonal, 
two  different  columns  cannot  have  their  single  nonzero  in  the  same  row.  This  implies  that  each 
column  and  each  row  of  T  have  at  most  one  nonzero.  A  suitable  permutation  of  the  rows  and 
columns  of  T  gives  a  diagonal  matrix  E,  which  completes  the  proof  D 

Combining  the  relationship  (3.20)  between  T  and  S  and  Theorem  3.3.1,  we  conclude  that 
problem  (3.18)  has  a  solution  of  the  form  S  =  UriiSrisV^,  where  Hi  and  U2  are  permutation 
matrices.  We  will  show  that  we  can  eliminate  one  of  these  two  permutation  matrices. 

Substimting  S  =  UniSn2V^  in  (3.18),  the  equivalent  optimization  problem  is  obtained 


as: 


min      trfs^(nfA-ini)S  +  n2A-^n2^^  (3.28) 

s,ni.n2        \  J 

M 

subject  to  tr  "^al  <  P 


1=1 


where  M  represents  the  minimum  of  N  and  nj.  In  the  above  optimization  problem,  the  mini- 
mization is  over  diagonal  matrices  S  with  cti  , . . . ,  a^/  as  the  diagonal  elements,  and  two  per- 
mutation matrices  111  and  112. 

Since  the  symmetric  permutations  nf  A-^TIi  and  naA^^nf  essentially  interchange  di- 
agonal elements  of  A  and  A,  (3.28)  is  equivalent  to 


M 

mm 


iiuii     V = (3.29) 


subject  to     ^  a'l  <  P,     tti  eVn,      7126  Vn 

x=l 

where  Vn  is  the  set  of  bijections  of  {1, 2, . . . ,  /V}  onto  itself. 
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We  will  now  show  that  the  optimal  solution  only  depends  on  the  smallest  eigenvalues  of 
Q;v  and  the  largest  eigenvalues  of  R(. 

Lemma  3.3.2.  Let  UAU"  and  VAV^  be  diagonalizations  ofQ  and  D  respectively  where 
the  columns  o/U  and  V  are  orthonormal  eigenvectors.  Let  a,  tti,  and  1^2  denote  an  optimal 
solution  0/(3. 29)  and  define  the  sets 

M  =  {i:cT,>0},      Q={A,,(,)  :v:g>1},      and    7^  =  {(),,(.):  r  €  A4}, 

IfM  has  I  elements,  then  the  elements  of  the  set  Q  constitute  the  I  smallest  eigenvalues  o/Q/v, 
and  the  elements  ofTZ  constitute  the  I  largest  eigenvalues  R(,  respectively. 

Proof.  Assume  k  ^  M  and  K,(k)  <  Kid)  ^^^  some  i  e  M.  It  is  easy  to  see  that  by  inter- 
changing the  values  of  tti  (i)  and  ttj  (A:),  the  new  ?-th  term  in  the  cost  function  is  smaller  than  the 
previous  i-th  term.  It  contradicts  the  optimal  assumption  of  a  and  vr.  Then  X^i(k)  >  Ki(i)- 

Then,  suppose  that  k  ^  M  and  ^^^(fc)  >  ^Mi)  ^r  some  i  e  M.  LqX  C  denote  the  cost 
value  due  to  the  sum  of  the  i-th  term  and  the  /c-th  term  before  the  interchange.  Similarly,  let  C+ 
denote  the  cost  value  due  to  the  sum  of  the  i-th  term  and  the  A;-th  term  after  the  interchange  of 
the  values  of  7r2(i)  and  7r2(A;).  We  have 

^  ""  ~^T\ TTTx +  ^"'^2(*^' 


and 


Since  S^^^k)  >  ^tt2{i)' 


c+  -c 

{^T^2{k)    -    ^^2(i)){(^l^^T^-2(k)^M^lKi{l)    +   ^l^^lT2(k)IK,.{l)    +   (^  I     ^7T2(l)  /  Kl{t)) 

<    0. 

The  cost  is  reduced  by  interchanging  the  values  of  7r2(i)  and  'K2{k),  which  violates  the  optimality 
of  CT  and  TT.  Hence,  dT,.^(k)  <  <^7r2(i)-  '-' 
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Using  Lemma  3.3.2,  we  now  show  that  one  of  the  permutations  in  (3.29)  can  be  deleted  if 
the  eigenvalues  of  Qn  and  Rt  are  arranged  in  a  particular  order. 

Theorem  3.3.2.  Let  UAU^  andVAV'^  be  diagonalizations  o/Qtv  andRt  respectively  where 
the  columns  o/U  and  Y  are  or t honor mal  eigenvectors,  the  eigenvalues  ofQ^  are  arranged  in 
increasing  order  and  the  eigenvalues  of  Rf  are  arranged  in  decreasing  order  If  M  is  the 
minimum  of  the  rank  of  Qn  andK,,  then  (3.29)  is  equivalent  to 

M  ^ 


subject  to     2_j  '^^  -  ^'      TT  G  Pa 


Ml 


where  Oi  =  Qfor  i  >  M. 

Proof  Since  at  most  M  eigenvalues  of  either  Q/v  or  R,  are  nonzero,  it  follows  from  Lemma 
3.3.2  that  the  set  M  has  at  most  M  elements.  Since  the  elements  of  Q  are  the  smallest  eigen- 
values of  Q  and  the  elements  of  11  are  the  largest  eigenvalues  of  R(,  we  can  assume  that 
7ri(i)  e  [1,  A/]  and  7r2(i!)  G  [1,  M]  for  each  i  G  M.  Hence,  we  restrict  the  sum  in  (3.29)  to 
those  indices  i  G  S  where 

S  =  {Ti^\:i):l<:i<M]. 

Let  us  define  a]  =  a^^i^^^  and  ii{j)  =  ^i(^2"^0'))-  Since  it{j)  G  [1,M]  for  j  G  [LM],  it 
follows  that  n  G  Vm.  In  (3.29)  we  restrict  the  summation  to  /:  G  5  and  we  replace  t  by  7r2"^(j) 
to  obtain 

T-. =  y^TT^ ^>     where  V(a;)2  <  P. 

Thiscompletes  the  proof  of  (3.30).  CD 

Combining  the  relationship  (3.20)  between  T  and  S,  Theorem  3.3.1  and  Theorem  3.3.2 
yields  the  following  corollary: 

Corollary  3.3.1.  Problem  (3.18)  has  a  solution  of  the  form  S  =  UFISV^  where  the  columns 
of  IJ  and  V  are  orthonormal  eigenvectors  ofQiM  and  R^  respectively  with  the  eigenvalues  of 
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Q/v  arranged  in  increasing  order  and  the  eigenvalues  q/R/  arranged  in  decreasing  order,  H  is 
a  permutation  matrix,  and  S  is  diagonal. 

Proof.  Let  a  and  tt  be  a  solution  of  (3.30).  For  i  >  M,  define  t:{i)  =  i  and  Oi  =  0.  If  11  is  the 
permutation  matrix  corresponding  to  tt,  then  making  a  substitution  S  =  UIISV^  in  the  cost 
function  of  (3.18)  yields  the  cost  function  in  (3.30).  Since  (3.29)  and  (3.30)  are  equivalent  by 
Theorem  3.3.2,  S  is  optimal  in  (3.18).  D 

3.3.2    The  Optimal  E 

We  now  consider  the  optimization  problem  which  minimizes  the  cost  function  over  a  with 
the  permutation  tt  in  (3.30)  given.  Then  in  the  next  subsection,  we  will  find  the  optimatial 
permutation  tt  based  on  the  solution  to  the  optimization  problem  considered  here.  For  the  sake 
of  notation  simplicity,  let  pi  denote  1/A^(,)  and  g,  denote  !/<),.  Hence,  for  fixed  tt,  (3.30)  is 
equivalent  to  the  following  optimization  problem: 

rnin  >     t. (3.31) 

M 

subject  to     yj  erf  <  P. 

The  solution  of  (3.31)  can  be  expressed  in  terms  of  a  Lagrange  multiplier  related  to  the  power 
constraint.  The  structure  of  this  solution  has  a  water  filling  interpretation  in  the  communication 
literature  [74]. 
Theorem  3.3.3.  The  optimal  solution  o/(3.31)  is  given  by 

a.  =  rnax(./^-^,    ol      ,  (3.32) 

where  the  parameter  ^  is  chosen  so  that 

M 

Y.o]  =  P.  (3.33) 

Proof.  Since  the  minimization  of  the  cost  fianction  in  (3.31)  is  over  a  closed  and  bounded  set, 
there  exists  a  solution.  At  an  optimal  solution  to  (3.3 1),  the  power  constraint  must  be  an  equality. 
Otherwise,  we  can  multiply  a  by  a  scalar  larger  than  1  to  reduce  to  the  value  of  the  cost  function. 
For  the  sake  of  notation  simplicity,  let  i^  =  af.  Then  the  reduced  optimization  problem  (3.31) 
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is  equivalent  to 

M 

mill  y = (3.34) 


M 

sufj 


jbject  to     X!  i.  =  ^-     t  >  0. 


Since  the  cost  function  is  strictly  convex  and  the  constraint  is  convex,  the  optimal  solution  to 
(3.34)  is  unique. 

According  to  the  Lagrange  multiplier  theorem,  the  first-order  necessary  conditions  [51] 
(Karush-Kuhn-Tucker  conditions)  for  an  optimal  solution  of  (3.34)  are  the  following:  there 
exists  a  scalar  //  >  0  and  a  vector  v  e  M^  such  that 

+  /i  -  i/i  =  0,     t/i  >  0,     i.  >  0,     v,U  =  0,      1  <  2  <  M.  (3.35) 


{PiU  +  qiY 

Due  to  the  convexity  of  the  cost  and  the  constraint,  any  solution  of  these  conditions  is  the  unique 
optimal  solution  of  (3.34). 

A  solution  to  (3.35)  can  be  obtained  as  follows.  We  define  the  function 

VV  i>ii'     Pi  J 

Here  x+  =  max{.x,  0}.  This  particular  value  for  U  is  obtained  by  setting  /^^  =  0  in  (3.35)  and 
solving  for  U\  when  the  solution  is  <  0,  we  set  t,{p)  =  0  (this  corresponds  to  the  +  operator 
(3.36)).  We  note  that  t,{p)  is  a  decreasing  function  of /i  which  approaches  +oo  as  p.  approaches 
0  and  which  approaches  0  as  /x  grows  to  +oo.  Hence,  the  equation 

M 

Y,U{p)  =  P  (3.37) 

i=\ 

has  a  unique  positive  solution.  Since  U{pjq^,)  =  0,  we  have  t,{p)  =  0  for  //  >  pj/gf .  Then  we 
have 

+  p  =  — \  +  P>^     for  ^  >  p./qf . 


We  deduce  that  the  Karush-Kuhn-Tucker  conditions  can  be  satisfied  when  p  is  the  positive 
solution  of  (3.37).  Q 
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3.3.3     Optimal  Eigenvector  Ordering 

Finally,  we  need  to  find  an  optimal  permutation  in  (3.30),  i.e.,  an  optimal  ordering  for  the 
eigenvalues  of  Q;v  and  R(. 

Theorem  3.3.4.  If  the  eigenvalues  {A,}  o/Q/v  are  arranged  in  increasing  order  and  the  eigen- 
values {5i}  ofRt  are  arranged  in  decreasing  order,  then  an  optimal  permutation  in  (3.30)  is 

7r(i)  =  i.      l<i<  M.  (3.38) 

Proof.  Assume  that  there  exist  indices  i.  and  j  such  that  /  <  j.  A,  >  A^  and  (J,  >  <),,  i.e.,  p,  <  pj 
and  q,  <  (ij.  K  and  Aj  are  not  arranged  in  the  supposed  optimal  order  for  the  eigenvalues  of 
Qn.  We  will  show  that  it  will  cause  contradiction. 

Let  us  consider  the  following  optimization  problem: 

min  — + ^ (3.39) 

t„tj  P,t^  +  q^  Pjtj+Qi 

subject  to  ti  +  tj  =  P,     U  >  0,    ij>  0, 

where  P  =  af  +  cr^.  Since  a  yields  an  optimal  solution  of  (3.30),  it  follows  that  a  solution  of 
the  above  optimization  problem  is  i,  =  af  and  ij  =  a].  Based  on  Theorem  3.3.3,  the  U  is  given 
as 

U{l^)  =  \[^--.  (3-40) 

V  PiM     Pi 

where  /i  is  a  Lagrange  multiplier  obtained  from  the  power  constraint  ti  +  tj  =  P  as: 

Pi      Pj 
Let  C  denote  the  cost  function  for  (3.39).  Combining  (3.40)  and  (3.41)  gives 

/     1        I        1     \2 

P^ti  +  <?.        Pjtj  +qj  P+%  +  ^' 

Now,  suppose  that  we  interchange  the  values  of  p,  and  p,.  Let  C+  denote  the  cost  value 
associated  with  the  interchange.  With  the  assumption  that  the  optimal  solution  of  (3.39)  is  still 
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positive  after  the  exchange  of  pi  and  pj,  we  have 


C+  =    -^  „  ^     .  (3.42) 

p.      pj 

We  need  to  use  the  following  lemma  [40]: 

Lemma  3.3.3.  Ifai,  b,,  i  =  I, . . .  ,nare  two  sets  of  numbers, 

n  n  n 

i=l  i=l  1=1 

From  the  above  lemma,  we  have  ^  +  2l  >  a.  +  Si.  Then  C+  <  C. 

Pt  Pj  Pi  Pj 

If  C+  <  C,  it  contradicts  the  optimality  of  cr.  Then  we  have  C+  =  C.  Hence,  for  each  i 
and  j  with  ?'  <  j,  p,  <  Pj  and  g,  <  gj,  we  can  interchange  the  values  of  p,  and  pj  to  obtain  a  new 
permutation  with  the  same  value  for  the  cost  function.  After  the  interchange,  we  have  pi  >  pj, 
i.e.,  A,  <  A,.  In  this  way,  the  A,  are  arranged  in  increasing  order.  Since  the  6,,  are  arranged  in 
decreasing  order,  we  conclude  that  the  associated  optimal  permutation  it  is  (3.38). 

One  technical  point  must  now  be  checked:  we  should  verify  that  if  p,  <  Pj  and  r/,  <  q^ 
with  i  <  J,  and  if  we  exchange  p,  and  pj,  then  the  corresponding  optimal  solution  of  (3.39) 
remains  positive. 

To  check  it,  we  consider  two  cases  respectively.  For  the  first  case,  suppose  a^  >  0  with 
i  <  j  <  k,  pk  <  Pt  <  pj  and  q^  <  q^  <  qk-  From  (3.40),  we  have 


Then 


After  the  exchange. 


^''^'        ^1  pkp        Pk 


1        qk 


fp- 


y/V-         y/pk 


,     /      +N    _         /_i SL^    —(— ^-^\    >    —(— ^)    >    0 
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Similarly, 

For  the  second  case,  suppose  j  =  max(A/')  and  p,  =  min(Q),p,  <  pj  and  q,  <  qj. 

Since  the  original  solution,  before  the  exchange,  is  positive,  it  follows  from  (3.40)  and 

(3.41)  that 

P>^l^-2l     and     P>^-^.  (3.43) 

y/pU^       Pj  y/PiPj        P^ 

After  the  exchange,  the  analogous  inequalities  that  must  be  satisfied  to  preserve  nonnegativity 

are 

P>_^1_-^,  (3.44) 

^/P^pJ  Pj 

and 

P  >  -Si ^.  (3.45) 

s/NP]  Pi 

(3.45)  is  satisfied  from  (3.43)  and  the  fact  that  p,  <  p,  and  g,  <  q^.  If  (3.44)  is  also  satisfied,  the 
proof  is  completed. 

If  (3.44)  is  not  satisfied,  i.e.,  P  <  -^  -  ^,  the  associated  cost  after  the  exchange  is 

PjP+Qr  Qj 

where  U  =  P  and  t^  =  0.  We  will  show  that  C*  <  C  with  P>-J^-J,F>^-^, 
and  P  <  -^  -  ^.  Letting  C*  <  C  gives 


P,P  +  q^       Q:r    P+l  +  %' 
Multiplying  both  sides  of  the  above  inequality  with  (PjP  +  gO'7j(^  +  ^  +  J)  gives 

^  (p+^  +  ^)  + (p^.p  +  <^,)(P+^  +  ^)  <(-!,  + ^)2(p,P  +  g,)'7r 

^  Pr  Pi  P.  ft  V^  VPj 

After  considerable  algebra  on  the  above  inequality,  we  find  that  to  show  C*  <  C  is  equivalent 
to  show  that  f{P)  <0  with 
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when  P  e  (max[:J=  _  i..  J^  _  ^],  J^  -  ;^).  Since 


'pTPj        Pj  '  ^fPxPi        p.;  ^ '  sfPiP]        Pi 


when  ^2^  -  ^  >  ^  -  ^, 


when  -^  -  2i  >  ^L=  -  ^,  and 

sfPiP]  Pt    ~"    VPtPj  Pj 

we  have  C*  <  C.  D 

Combining  Corollary  3.3.1,  Theorem  3.3.3  and  Theorem  3.3.4,  we  conclude  that  the  opti- 
mal training  sequences  should  be  designed  according  to  the  following  theorem. 
Theorem  3.3.5.  Let  UAU^  andVAV"  be  the  diagonalizations  ofQ^  and  R(  respectively 
where  the  cohtmns  o/U  and\  are  orthonormal  eigenvectors,  the  corresponding  eigenvalues 
{Aj}  are  arranged  in  increasing  order  and  {(5,}  are  arranged  in  decreasing  order  Then  the 
optimal  solution  of  (3.18)  is  given  by 

S  =  USV^,  (3.47) 

where  S  specifies  the  power  allocation  which  is  diagonal  with  diagonal  elements  given  by 

a,  =  ma^j.p-^,    o|         forl<i<n,  (3.48) 

and  (Ji  =  Ofi)r  i  >  Ut,  with  the  parameter  p  is  chosen  so  that 

nt 

With  the  optimal  training  sequence,  the  channel  estimator  simplifies  to 

H  =  YU„,rvf,,  (3.49) 
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where  T  =  diag{7i,  - . . ,  7„J  with  ji  =  ^^ffx"'  ^^^  columns  of  U„,  are  the  eigenvectors  of  Qa- 
corresponding  to  its  Ut  smallest  eigenvalues,  and  the  columns  of  V„,  are  the  eigenvectors  of  Rt. 
The  design  of  the  optimal  training  sequences  summarized  in  the  above  theorem  has  a  clear 
physical  interpretation.  Each  eigenvector  of  the  transmit  correlation  matrix  R(  represents  the 
transmit  eigen-direction  and  the  associated  eigenvalue  indicates  the  charmel  gain  in  that  eigen- 
direction.  More  power  should  be  assigned  to  the  signals  transmitted  along  the  eigen-direction 
with  larger  channel  gains.  On  the  other  hand,  each  eigenvector  of  the  interference  temporal 
correlation  matrix  Q/v  represents  the  interference  subspace  and  the  corresponding  eigenvalue 
indicates  the  amount  of  interference  in  that  subspace.  Hence,  we  should  choose  the  subspaces 
with  the  least  amount  of  interference  for  transmission.  To  facilitate  the  understanding  of  the 
physical  meaning  of  optimal  training  sequences,  we  can  rewrite  them  in  an  alternative  way  as 

nt 

1=1 
where  u,  are  orthonormal  eigenvectors  of  Qyv  with  the  corresponding  eigenvalues  arranged  in  an 
increasing  order  and  v,  are  orthonormal  eigenvectors  of  R(  with  the  corresponding  eigenvalues 
arranged  in  a  decreasing  order.  The  vectors  u;  and  v,  fonn  transmission  directions  in  time 
and  space,  respectively.  The  above  theorem  implies  that  the  optimal  training  sequence  design 
put  more  power  to  the  transmission  direction  constructed  by  the  eigen-directions  with  larger 
channel  gains  and  the  interference  subspaces  with  less  interference.  The  power  assigmnent  is 
determined  by  the  water-filling  argument  under  a  finite  power  constraint. 

3.4  Estimation  of  Channel  Statistics  and  Feedback  Design 
To  implement  the  channel  estimator  and  construct  the  optimal  training  sequences  for  chan- 
nel estimation,  we  need  the  knowledge  of  the  transmit  antenna  correlation  matrix  R(  and  the 
interference  covariance  matrix  Qn  at  both  the  receiver  and  transmitter  sides.  Since  these  two 
matrices  are  long-term  channel  characteristics,  they  can  be  estimated  by  using  the  observed 
training  signals  at  the  receiver  end  and  then  fed  back  to  the  transmitter  end  for  the  construc- 
tion of  the  optimal  training  sequences.  In  this  section,  we  propose  an  algorithm  to  estimate 
these  long-term  channel  characteristics  and  design  an  efficient  feedback  scheme  so  that  we  can 
approximately  construct  the  optimal  training  sequences  at  the  transmitter  end. 
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Let  us  assume  that  the  training  signal  matrix  S  is  sent  over  a  block  of  K  packets.  The 
received  training  signals  for  the  nth  packet  are  given  as 

y(n)    =    (S  ®  I„Jh(n)  +  e(n) 

=    (S®I„J(R,'/'^Ry')h^(n)  +  e(n) 

=    iSRy'®R]/')h.„{n)  +  e{n).  (3.50) 

We  can  calculate  the  sample  average  correlation  matrix  of  the  received  signal  from  the  previous 
K  packets  as  follows: 

11=1 

It  is  easy  to  see  that  R  is  the  sufficient  statistics  for  the  estimation  of  the  second-order  correlation 
matrices  R,  and  Qn  if  e(r/.)  is  Gaussian  distributed. 

We  can  show  that  the  correlation  matrix  of  the  received  signal  has  the  Kronecker  product 
form: 

R    =    E(y(n)y(n)^) 

=    {Sny'  ®  Ry')E{hUn)K{n)"){Ry'S''  0  R^^)  +  E(e(n)e(n)«) 
=    (SR,S^)®Rr  +  Q,v®Rr 

=      (SRjS^  +  Qyv)  X  Rr 

=    R,«R,.  (3.52) 

where  R,  =  SRtS"  +  Q/v.  If  R  =  R,  ®  R,.,  then  R  =  aR,  S  ^Rr  for  any  a  ^  0.  Hence,  Rg 
and  Rr  can  not  be  uniquely  identified  from  observing  y(n).  Fortunately,  the  channel  estimator 
and  the  design  of  optimal  sequences  are  invariant  to  scaling  of  the  estimates  of  R(  and  Qn-  This 
can  be  explained  as  follows: 


.-lA-l 


H'(n)    =    Y(n)(aQiv)-'S(S'^(aQ/v)~'S  +  (QRi)~') 
=     H{n) 


i\-i 
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and 

tr((S"(aQ;v)-'S  +  {aRt)-')''  =  Qtr((S"Q^^S  +  R,"')-'). 

We  notice  that  the  new  cost  function  of  the  optimization  problem  is  just  a  scaled  version  of  the 
original  cost  fiinction. 

For  the  estimation  of  R,  and  R^,  we  need  to  impose  an  additional  constraint  on  R^.  Here 
we  force  tr(Rr)  =  n,..  Then  an  iterative  flip-flop  algorithm  [75,  76,  77]  can  be  used  to  estimate 
Rf,  and  R,..  If  the  received  interference  signal  e(n)  is  Gaussian  distributed,  the  flip-flop  algo- 
rithm provides  the  maximum  likelihood  estimates  (MLE)  of  R,,  and  R,.  [75]  when  it  converges. 
Otherwise,  the  algorithm  gives  the  estimates  of  R,  and  R,.  in  the  least  square  sense.  For  fixed 
R-r,  the  MLE  of  Rg  is  obtained  as 

fi;='EE<«{^i:Y;.(")iY."(n)r}  (3.53) 

u=l  D=l  n=l 

where  a^^^  is  the  [u,  D)th  element  of  R,7^  and  Yu{n)  is  the  uth  row  vector  of  the  received  signal 
matrix  Y{rt).  Similarly,  for  fixed  R,,  the  MLE  of  R,  is  obtained  as 

^-  =  ]^EE<4^Ew„(n)W:,(n)}  (3.54) 

u=l  v=l  n—i 

where  a^^  is  the  (u,  i;)th  element  of  R^  ^  and  Wu{n)  is  the  uth  column  vector  of  the  received 
signal  Y(7i).  Then  to  get  uniquely  identifiable  R,  and  Rr,  we  need  to  scale  R^  to  make 
tr(R^)  =  Ur.  We  note  that  the  terms  inside  the  braces  in  (3.53)  and  (3.54)  can  be  computed 
before  the  running  of  the  iterative  estimation  algorithm  to  reduce  computational  complexity.  To 
start  the  iterative  algorithm,  an  initial  value  of  either  R^  or  R,.  should  be  assigned.  A  natu- 
ral choice  is  to  initially  make  R^  =  I,,,.  Then  the  iterative  algorithm  alternates  between  the 
calculation  of  R^  and  R^  until  convergence.  While  it  is  difficult  to  prove  analytically  that  the 
algorithm  converges  to  the  MLE,  extensive  data  experiments  [75]  in  statistics  show  that  it  al- 
ways converges  to  the  MLE  for  situations  of  practical  sample  sizes.  The  convergence  in  our 
case  is  also  verified  by  the  numerical  results  in  Section  3.5. 
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Then  Rt  and  Q^  can  be  estimated  based  on  R,.  We  note  that  only  7^(Qyv)  n  7^-^(S)  can 
be  uniquely  identified  fi-om  R,  in  the  sense  below  (TZ  denotes  the  range  space  of  a  matrix  and 
TZ-^  denotes  the  perpendicular  subspace  of  the  range  of  a  matrix): 

Lemma  3.4.1.  LetKt  andQ^  be  Hermitian  positive  semi-definite  matrices  and  Kg  =  SRtS'^+ 
Qn.  where  S  is  offitllrank.  Let  D  =  {(R,,  Q;v)  :  7^(Qiv)  C  7^^(S)}.  Then  there  is  an  1-1 
correspondence  between  R,,  and  (R/,,  Qat)  only  for  the  pairs  o/(R(.  Qw)  in  D. 

Proof.  Let  Ps  =  S(S"S)-^S^  be  the  projection  onto  n{S)  and  P^  =  I  -  Ps  be  the  projection 
onto7^^(S). 

First,  let  (Rf,  Q,v),  (R^,  Q'/v)  ^  O.  Let  R,  =  SR^S^  +  Q/v  and  R^  =  SR'.S^  +  Q'^v- 
Consider  P^R,  =  P^Q/v  =  Qyv,  PsRg  =  SR^S",  and  P^R;  =  P^Q,v  =  Q/v,  ^sK,  = 
SR;S^.  Since  S  is  of  full  rank,  PgR,  =  PsR!,  iff  Rt  =  K  Also  since  Pg  and  P^  are 
projections  onto  complementary  subspaces,  R,  =  R^  iff  PgR,,  =  PgRq  ^"'^  PsR-?  =  PsR-^. 

i.e.  (R„Q;v)  =  (R;,Q;). 

Conversely,  let  (R,,  Q/v)  G  D  and  R„  =  SR,S"  +  Q/v.  Now  choose  R'^  ^  K,  and  define 
Q'^  =  Qa,  +  SR,S"  -  SR;S".  Since  7^(Q/v)  C  7^^(S)  and  S  is  of  full  rank,  (R',,  Q'^v)  ^  ^■ 

But  r;  =  sr;s^  +  q;^  =  sr^s^  +  qn  =  r,.  o 

Based  on  the  above  lemma,  we  see  that  estimating  Q/v  and  R(  simultaneously  from  R,  is 
not  possible.  Howerver,  since  P^R^Pg  =  PgQ/vPs,  we  can  estimate  Q/v  from  P^RgPg . 
For  notation  simplicity,  let  A  denote  P^R^Pg  ■  Since  the  interference  signals  are  wide-sense 
stationary  in  time,  Q/v  is  a  Topelitz  matrix  which  can  be  represented  by  a  sequence  {qk;k  = 
0,±l,--±(iV-l)}withQ,v  =  {qk,j)  =  {Qk-j}-  Then  the  ?jth  element  of  P^Q/vP^  is  given 
by  J2i  ^kP^i^i-kPkj  with  p,-j  denoting  the  ijth  element  of  P^.  Equating  the  ijth  element  of 
Ps  Q/vPs  with  a,j,  we  have  a  set  of  linear  equations  in  {%}.  Noticing  the  hermitian  nature  of 
PgQ/vPs  and  A  and  separating  the  real  and  imaginary  parts  of  qk  and  Oy,  we  have  /V^  linear 
equations  with  2N  -  I  unknowns  in  q^  =  [qo,  R-e{q\),  lm{qi), . . . ,  Re((7/v-i)-  lm(q/v-i)]^.  The 
set  of  linear  equations  can  be  solved  by  employing  the  least  square  approach.  Then  the  estimate 
of  Q/v  can  be  constructed  based  on  Qr. 
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In  addition,  when  N  is  large,  Qn  can  be  approximated  by  a  circulant  matrix  [78]  with  fixed 
eigenvectors  as: 

Qyv  ~  F/v*FJ^  (3.55) 

where  Fat  is  the  N  x  N  FFT  matrix  and  ^  is  a  diagonal  matrix  containing  eigenvalues  V'l-  We 
notice  that  we  only  require  the  rit  smallest  eigenvalues  of  Qyv  and  their  corresponding  eigen- 
vectors in  constructing  the  optimal  training  sequences.  With  the  circulant  matrix  approximation 
(3.55),  it  is  equivalent  to  estimating  the  7i,  smallest  eigenvalues  Va  and  identifying  the  corre- 
sponding columns  of  F.  The  rit  smallest  positive  eigenvalues  of  Q^r  are  used  as  the  estimates 
of  the  lit  smallest  t/'t,  and  the  corresponding  columns  of  F  are  chosen  as  those  closest  to  the 
eigenvectors  associated  with  the  rit  smallest  positive  eigenvalues  of  Q^r. 

The  estimates  of  the  nt  smallest  xpi  and  the  Ut  indices  of  the  chosen  columns  of  F^  are 
then  fed  back  to  the  transmitter  for  the  optimal  training  sequence  construction.  We  notice  that 
it  is  bandwidth  efficient  to  just  feed  back  these  indices  of  Fyv  instead  of  the  whole  eigenvectors 
of  Q,v  because  the  number  of  training  symbols  A^  during  the  training  period  is  usually  large. 

To  derive  the  estimator  of  R(,  we  need  the  following  lemma  which  establishes  the  asymp- 
totical equivalence  of  Q/v  and  PgQivPs  ^s  N  increases. 

Lemma  3.4.2.  With  the  assumption  that  Q/v  is  an  absolutely  summable  Toeplitz  matrix,  Qn  and 
PsQ/vPs  "^^'^  asymptotically  equivalent.  Since  Qn  is  Toeplitz,  PsQjvPs  '■^  asymptotically 
Toeplitz. 

Proof.  Two  definitions  of  the  norms  of  a  matrix  which  include  the  strong  norm  and  weak  norm 
[78,  79]  are  needed  to  study  the  asymptotic  equivalence  of  two  matrices.  The  strong  norm  ||  A  || 
is  defined  as 

II  A  11=  maXx:x-x=i[x*A*Ax]  =  \/A„,„:r(A*A) 

where  Amax  represents  the  largest  eigenvalues  of  a  matrix.  If  A  is  Hermitian,  ||  A  ||=  \\nax{^)\- 
The  weak  nonn  of  A  is  defined  as 

|A|  =  (n-iTr[A*A])2. 
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Two  sequences  of  n  x  7i  matrices  A„  and  B„  are  said  to  be  asymptotically  equivalent  [78] 
if  A„  and  B„  are  uniformly  bounded  in  strong  norm: 

II  A„  II,  II  B„  ||<  A/  <  oo 

and  A„  —  B„  approaches  zero  in  weak  norm  as  n  — >  oo: 

lirn  |A„,  -  B„|  =  0. 

n— *oo 

If  one  of  the  two  matrices  is  Toeplitz,  then  the  other  is  said  to  be  asymptotically  Toeplitz. 

Without  the  loss  of  generality,  we  assume  that  Qyv  is  an  absolutely  summable  Toeplitz 
matrix.  (For  the  temporal  interference  correlation  matrix  Q^v  arising  from  practical  scenarios, 
such  as  jamming  signals  and  co-channel  interference  considered  here,  it  is  easy  to  verify  that 
Qtv  is  absolutely  summable.)  Q,v  can  be  represented  by  a  sequence  {g/,-;  fc  =  0,  ±1,  ±2 —  } 
with  Qisi  =  {qk,,}  =  {qk-j}  and  Y^'^^^^  \qk\  <  oo.  It  is  shown  [80]  that  Q/v  is  bounded  in 
strong  norm  as: 

II  Qn  ||<  2  Y^  \qk\  =  2A/^  <  oo. 

Then  we  need  to  show  that  ||  PgQwPs  II  is  also  bounded.  Usmg  the  properties  of  the  strong 
nonn,  we  have 

II  PsQ'vPs  II 

=    ||(I-Ps)Qn(I-Ps)|| 

=    HQN-PsQw-QyvPs  +  PsQivPs  II 

<     II  Qn  II  +  II  PsQn  II  +  II  QnPs  II  +  II  PsQnPs  II  • 

To  proceed,  we  need  the  following  lemma  [40]: 

Lemma  3.4.3.  For  two  Hermitian positive  semi-definite  matrices  G  andH, 
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Then,  we  have 

II  PsQiV   11=  [A„ax(QivPsQ/v)]^    <   [A„,a^(QN)A„ax(Ps)A,„ax(Qiv)]^   =  XmaxiQN)  =\\  Qn  \\ 

Similarly,  ||  Q/vPs  ||<||  Qn  II  and  ||  PsQa-Ps  ||<||  Qn  \\-  Thus,  ||  P^Q/vPs   ll<  4  || 
Qn  11=  8M„.  Let  M  =  8Mg,  then  ||  Qn  \\<  M  <  oo  and  ||  P^QyvPs  ll<  ^^  <  oo. 

Next,  we  need  to  show  that  the  distance  of  the  two  matrices  goes  to  zero  asymptotically  in 
weak  norm.  Using  the  properties  of  weak  norm,  we  have 

IQyV  -  PsQ/vPsI 

=    |PsQ/v  +  Q/vPs-PsQ/vPs| 

<    IPsQivl  +  IQ^Psl  +  lPsQ^PsI- 

We  need  the  following  Lemma  [78,  80]: 

Lemma  3.4.4.  Given  two  n  x  n  matrices  G  and  H,  then 

|GH|  <||G  II  |H|. 

The  weak  norm  of  Ps  can  be  written  as 

|Ps|  =  (A^-^TrlSlS^Sj-iS^^])^  =  (yV-iTr[I„J)^  =  (^)i 

Then  using  the  above  lemma,  we  have 

IQivPsI  <||  Qn  II  |Ps|  =  (^)^  II  Qn  ||<  (^)^2M,. 

Similarly,  |PsQa'PsI  <I|  PsQ.v  ||  (f)'  <||  Qa'  II  (^)'  <  (t)'2M,  and 
|PsQw|  =  |Q;vPs|  <  (^)^2A/,^.  Then,  we  can  show  that 

IQ^  -  P^QyvPs  I  <  3  lim  (^)'2M,  =  0. 

D 
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Based  on  the  above  lemma,  the  transmit  channel  correlation  matrix  R,  can  be  estimated  by 
projecting  the  received  signal  onto  TZ{S).  Since  N  is  usually  much  larger  than  nt,  we  have 

R.^SRtS'^  +  P^QyvPi,  (3.56) 

and  hence 

PsR,Ps  ~  PsSR,S"Ps  +  PsPiQA'PiPs  =  SR,S".  (3.57) 

Then  we  can  estimate  the  transmit  channel  correlation  matrix  R,  using 

R,  =  (S^S)-^S^R,S(S^S)-V  (3.58) 

3.5  Numerical  Results 
In  this  section,  we  present  some  numerical  results  to  show  the  performance  gain  for  channel 
estimation  achieved  by  the  designed  optimal  training  sequences.  We  consider  a  MIMO  system 
with  3  transmit  antennas  and  3  receive  antennas.  The  antennas  form  uniform  linear  arrays  at  both 
the  transmitter  and  the  receiver.  For  a  small  angle  spread,  the  correlation  coefficient  between 
the  ith  and  the  jth  transmit  antenna  [67]  can  be  approximated  as: 

[R^]^^^  «  ^  /  %xp{-J27^|^  -  j|sinA^sin^}rf0  =  .k{27:\t  -  j\smAj),         (3.59) 

where  Jo(x)  is  the  zeroth  order  Bessel  function  of  the  first  kind,  A  is  the  angle  spread,  dt  is 
the  antenna  spacing  and  A  is  the  wavelength  of  a  narrow-band  signal.  We  set  d,  =  0.5 A.  In 
the  simulations,  we  consider  two  channels  with  different  transmit  channel  correlations:  a  high 
spatial  correlation  channel  with  A  =  5"  and  a  low  spatial  correlation  channel  with  A  =  25°. 
The  receive  correlation  matrix  R^  is  calculated  similarly  as  the  transmit  correlation  matrix  with 
A  =  25". 

We  consider  two  kinds  of  interference:  the  co-charmel  interference  from  other  users  in  the 
same  wireless  system  and  jamming  signals  which  are  usually  modeled  by  autoregressive  (AR) 
random  processes. 

We  compare  the  channel  estimation  performance  in  terms  of  the  total  MSB  for  systems 
using  different  sets  of  training  sequences.  The  following  different  training  sequence  sets  are 
considered  for  comparison:  1)  the  optimal  training  sequences  described  in  Section  3.3.,  2)  the 
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approximate  optimal  training  sequence  constructed  based  on  the  channel  and  interference  statis- 
tics obtained  by  using  the  proposed  estimation  algorithm  in  Section  3.4.,  3)  the  temporally  op- 
timal training  sequences  for  which  the  transmit  channel  correlation  matrix  is  assumed  to  be  an 
identity  matrix  and  only  temporal  interference  correlation  is  considered  in  designing  the  optimal 
training  sequences,  (we  also  consider  the  approximate  temporally  optimal  sequences  which  are 
constructed  based  on  the  channel  statistics  obtained  by  using  the  proposed  algorithm),  4)  the 
spatially  optimal  training  sequences  for  which  the  interference  is  assumed  to  be  temporally 
white  and  only  transmit  correlation  is  considered  in  designing  the  optimal  training  sequences, 
(we  also  consider  the  approximate  spatially  optimal  sequences  which  are  constructed  based 
on  the  channel  statistics  obtained  by  using  the  proposed  algorithm),  5)  Binary  orthogonal  se- 
quences, 6)  Random  sequences. 
3.5.1     Co-channel  Interference 

In  a  cellular  wireless  communication  system,  co-channel  interference  (CCI)  from  other 
cells  exists  due  to  frequency  reuse.  Hence,  the  interfering  signals  have  the  same  signal  format 
as  that  of  the  desired  user.  We  can  express  the  interfering  signal  transmitted  from  the  ith  transmit 
antenna  of  the  /nth  interferer  as 


IP 


/— —  oo 


where  P„j  is  the  transmit  power  of  the  r7ith  interferer,  and  {b^'^}  are  data  symbols  transmitted 
from  the  ith  transmit  antenna  of  the  rnth  interferer.  They  are  assumed  to  be  i.i.d.  binary  random 
variables  with  zero  mean  and  unit  variance.  In  addition,  i>{t)  is  the  symbol  waveform  and  T  is 
the  symbol  duration.  It  is  assumed  that  the  receiver  is  synchronized  to  the  desired  user  but  not 
necessarily  to  the  interfering  signals  and  r^  is  the  symbol  timing  difference  between  the  nith 
interferer  and  the  desired  user  signal.  Without  loss  of  generality,  we  assume  0  <  r„,  <  T.  The 
elements  of  the  interference  symbol  matrix  S,  are  samples  at  the  matched  filter  output  at  the 
receiver  at  time  index  jT.  The  (j,  i)th  element  of  S,  is 
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with 

/oo 
ip{t  -  s)i>"{s)ds  (3.62) 

■oo 

where  i){t)  is  the  autocorrelation  of  the  symbol  waveform.  For  the  co-channel  interference,  the 
temporal  interference  correlation  is  due  to  the  intersymbol  interference  in  the  sampled  interfer- 
ing signals. 

In  the  simulations,  it  is  assumed  that  there  are  two  interfering  signals  in  the  system  and  the 
SIR  (signal-to-interference  ratio)  is  set  to  be  OdB.  The  ISI-free  symbol  waveform  with  raised 
cosine  spectrum  is  chosen  as  the  symbol  waveform.  For  this  case,  we  have 

We  set  the  roll-oflf  factor  /?  =  0.5,  n  =  0.2T  and  rs  =  0.5T. 

In  Fig.  3.1  and  Fig.  3.2,  we  show  the  total  channel  estimation  MSEs  for  the  high  spatial 
correlation  channel  and  low  spatial  correlation  channel,  respectively.  For  both  cases,  the  opti- 
mal sequences  outperform  the  orthogonal  sequences  and  random  sequences  significantly.  For 
the  high  spatial  correlation  channel,  the  optimal  sequences  provide  a  substantial  performance 
gain  over  both  the  spatially  optimal  sequences  and  the  temporally  optimal  sequences.  The  ap- 
proximate optimal  sequences  achieve  most  of  the  perfomiance  gain  obtained  by  the  optimal 
sequences.  For  the  low  spatial  correlation  channel,  the  MSE  performance  of  the  approximate 
optimal  sequences  is  close  to  that  of  the  optimal  sequences.  The  temporal  correlation  has  a 
stronger  impact  on  the  channel  estimation  than  the  spatial  channel  correlation  due  to  the  fact 
that  the  length  of  training  sequences  N  is  much  larger  than  the  number  of  transmit  antennas  t. 
It  is  verified  by  the  simulation  results  shown  in  Fig.  3.2  that  the  temporally  optimal  sequences 
achieve  an  estimation  performance  similar  to  that  achieved  by  the  optimal  sequences.  These  two 
optimal  sequences  provide  significant  performance  gain  over  the  spatially  optimal  sequences. 
3.5.2     Januning  Signals 

We  assume  that  there  are  two  jamming  signals  in  the  system.  The  jamming  signals  are 
modeled  as  two  first  order  AR  processes  driven  by  temporally  white  Guassian  processes  {ui^t} 
as, 

Si,t  =  ttiSi.t-i  +  Ui^t  (3.63) 
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Figure  3.1:  Comparison  of  total  MSEs  obtained  using  different  training  sequences.   ISI-free 
symbol  waveform  and  high  spatial  correlation  channel. 


73 


Figure  3.2:  Comparison  of  total  MSEs  obtained  using  different  training  sequences.  ISI-free 
symbol  waveform  and  low  spatial  correlation  channel. 

where  s,^t  represents  the  jamming  signal  transmitted  by  the  ith  jammer  at  the  tth  time  index,  Oj 
is  the  temporal  correlation  coefficient,  and  Ui^t  has  zero  mean  with  variance  al^  which  decides 
the  transmit  power  of  the  ith  jammer.  The  SIR  is  set  to  be  0  dB.  We  choose  qi  =  0.4  and 

CV2  =  0.5. 

In  Fig.  3.3  and  Fig.  3.4,  we  show  the  total  channel  estimation  MSEs  for  the  high  spatial 
correlation  channel  and  low  spatial  correlation  channel,  respectively.  For  AR  jammers,  simi- 
lar conclusions  on  the  estimation  performance  achieved  by  different  training  sequences  can  be 
made  as  in  the  case  of  co-channel  interference. 

3.6     Conclusion 

In  this  chapter,  we  consider  a  wireless  communication  system  with  multiple  transmit  and 
receive  antennas  in  a  slow,  Rayleigh  flat-fading  environment.  We  study  the  problem  of  the 
estimation  of  correlated  MIMO  channels  with  colored  interference.  The  Bayesian  channel  es- 
timator is  derived  and  the  optimal  training  sequences  are  designed  based  on  the  mean  square 
error  (MSE)  of  channel  estimation.   We  propose  an  algorithm  to  estimate  long-term  channel 
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-*-  1)  Optimal  squences 
*     2)  Approximate  optimal  sequences 
-V-  3)  Temporally  optimal  sequences 
-e-  4)  Spatially  optimal  sequences 
-B-  5)  Orttiogonal  sequencs 

6)  Random  sequences 
-V-  Approximate  temporally  optimal  sequences 
-O-  Approximate  spatially  optimal  sequences 


Figure  3.3:  Comparison  of  total  MSEs  obtained  using  different  training  sequences.  AR  jammers 
and  higii  spatial  correlation  channel. 
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1)  Optimal  sequences 

2)  Approximate  optimal  sequences 

3)  Temporally  optimal  sequences 

4)  Spatially  optimal  sequences 

5)  Orthogonal  sequences 

6)  Random  sequences 

Approximate  temporally  optimal  sequences 
Approximate  spatially  optimal  sequences 


Figure  3.4:  Comparison  of  total  MSEs  obtained  using  different  training  sequences.  AR  jammers 
and  low  spatial  correlation  channel. 
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statistics  and  design  an  efficient  feedback  scheme  so  that  we  can  approximately  construct  the 
optimal  sequences  at  the  transmitter.  Numerical  results  show  that  the  optimal  training  sequences 
provide  substantial  performance  gain  for  channel  estimation  when  compared  with  other  training 

sequences. 

3.7     Appendix 

3.7.1     A  Trace  Problem 

In  this  appendix,  we  analyze  a  variant  of  the  optimization  problem  (3.18)  which  can  be 
fomiulated  as 

min        tr(Rjs^Q^'SRj+It)-'  (3-64) 

subject  to  tr{S"S}  <  P 

Two  different  trace  optimization  problems  (3.18)  and  (3.64)  are  related  in  the  form  of  cost 
functions.  The  cost  function  of  the  original  optimization  problem  (3.18)  can  be  rewritten  as 

tr(S"Q;^^S  +  Rr')-'  =  trR,(R,^S^Q^^SRf  +  It)-\ 

which  can  be  viewed  as  the  weighting  of  the  cost  function  of  the  new  trace  optimization  problem 
(3.64). 

For  the  sake  of  notational  simplicity,  we  consider  the  following  same  optimization  problem 
as  (3.64)  with  different  but  simpler  notations. 

min  tr(DS^QSD  +  I)-i  (3  65) 

s 

subject  to  tr  (S^S)  <  P,     S  G  C™'"'. 

Here  Q  is  a  nonzero  Hermitian,  positive  semidefinite  matrix,  D  is  a  nonzero  Hermitian,  positive 
definite  matrix,  and  the  positive  scalar  /'  is  the  power  constraint  associated  with  the  signal 
S.  The  main  results  on  the  solution  to  the  optimization  problem  (3.65)  are  cited  here  for  the 
completeness  of  the  dissertation  and  the  details  can  be  found  in  the  literature  [81].  We  write  the 

inverse  matrix 

C  =  (DS"QSD  +  I)-i 
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for  convenience. 

As  discussed  before,  the  solution  in  the  special  case  D  =  I  can  be  expressed  in  terms 
of  the  eigenvalues  and  eigenvectors  of  Q  and  a  Lagrange  multiplier  associated  with  the  power 
constraint.  For  the  optimization  problem  introduced  here,  D  7^  I  and  minimizing  the  trace  of 
C  is  more  difficult.  We  will  show  that  (3.65)  has  a  solution  that  can  be  expressed  S  =  USV" 
where  U  and  V  are  orthonormal  matrices  of  eigenvectors  for  Q  and  D  respectively,  and  S 
is  diagonal.  Solving  (3.65)  involves  computing  diagonalizations  of  Q  and  D,  and  finding  an 
ordering  for  the  columns  of  U  and  V.  We  are  able  to  evaluate  the  optimal  ordering  when  either 
P  is  large  or  P  is  small.  However,  for  intermediate  values  of  P,  evaluating  the  optimal  ordering 
is  more  difficult.  The  problem  (3.65)  has  a  combinatorial  nature,  unlike  the  special  case  D  =  I. 

The  trace  problem  (3.65)  arises  in  spreading  sequence  optimization  for  code  division  mul- 
tiple access  (CDMA)  systems.  In  cellular  communication  systems,  multiple  access  schemes 
allow  many  users  to  share  simultaneously  a  finite  amount  of  radio  resources.  CDMA  is  one  of 
the  main  access  techniques.  It  is  adopted  in  the  IS-95  system  and  will  be  used  in  next  generation 
cellular  communication  systems  [82].  In  a  CDMA  system,  different  users  are  assigned  different 
spreading  sequences  so  that  the  users  can  share  the  communication  channel.  We  consider  the 
uplink  (communication  from  the  mobile  units  to  the  base  station)  of  a  CDMA  system  where  the 
users  within  a  base  station  are  symbol  synchronous.  The  co-channel  interference  from  the  users 
in  the  neighboring  cells  are  modeled  by  additive,  colored  Gaussian  noise.  The  received  signal 
at  the  base  station  is 

K 

y  =  ^  hiSiXi  +  e, 
■1=1 
where  K  is  the  number  of  signals  received  by  the  base  station,  x,  is  the  symbol  transmitted  from 

the  ith  user,  s,  G  C^  is  the  spreading  sequence  assigned  to  the  zth  user,  /i,  is  the  channel  gain 
from  the  ith  user  to  the  base  station,  and  e  G  C^  is  the  additive,  colored  Gaussian  noise  with 
zero  mean  and  covariance  E.  Usually  the  size  of  K  and  A^  are  comparable.  It  is  assumed  that 
the  symbols  x,  are  independent  with  zero  mean  and  unit  variance.  The  received  signal  can  be 
expressed  as 

y  =  SHx  +  e,  (3.66) 
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where  S,  the  spreading  sequence  matrix,  has  jth  column  Sj,  and  H  is  a  diagonal  matrix  with 
?th  diagonal  element  h,.  Again,  by  the  Bayesian  Gauss-Markov  Theorem  [36,  83],  the  MMSE 
estimator  of  x  is 

X  =  (H^S^E-^SH  +  I)-iH'^S"E- V- 

The  corresponding  covariance  matrix  of  the  estimation  error  is 

The  optimal  spreading  sequences  for  all  the  users  which  minimizes  the  co-channel  interference 
to  other  cells,  subject  to  a  power  constraint,  corresponds  to  (3.65)  with  Q  =  E"^  and  D  =  H, 
a  diagonal  matrix. 

To  solve  the  trace  optimization  problem,  we  begin  by  analyzing  the  structure  of  an  optimal 
solution  to  (3.65).  Let  UAU^  and  VAV'^  be  diagonalizations  of  Q  and  D  respectively  (the 
columns  of  U  and  V  are  orthonormal  eigenvectors).  Let  5^,  1  <  i  <  n,  and  A^,  1  <  j  <  m, 
denote  the  diagonal  elements  of  A  and  A  respectively.  We  assume  that  the  eigenvalues  are 
arranged  in  decreasing  order: 

Si  >  82  >  ■  ■  ■  >  Sn     and     Ai  >  A2  >  . . .  >  A„,,.  (3.67) 

Let  us  define 

T  =  U^SV.  (3.68) 

Making  the  substitution  S  =  UTV^  in  (3.65)  yields  the  following  equivalent  problem: 

min  tr  (AT^^ATA  + 1)-^  (3.69) 

subject  to  tr  (T^T)  <  P,      T  G  C"^". 

We  now  show  that  (3.69)  has  a  solution  with  at  most  one  nonzero  in  each  row  and  column. 
Theorem  3.7.1.  There  exists  a  solution  o/(3.69)  of  the  form  T  =  niSn2  where  Hi  and  II2 
are  permutation  matrices  and  aij  =  Ofor  all  i  ^  j. 

Combining  the  relationship  (3.68)  between  T  and  S  and  Theorem  3.7.1,  we  conclude  that 
problem  (3.65)  has  a  solution  of  the  form  S  =  UniEDsV",  where  Di  and  Hs  are  permutation 
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matrices.  We  will  now  show  that  one  of  these  two  permutation  matrices  can  be  deleted  if  the 
eigenvalues  of  D  and  Q  are  arranged  in  decreasing  order. 

Let  A'  denote  the  minimum  of  m  and  n.  Making  the  substitution  S  =  UniEn2V^  in 
(3.65),  we  obtain  the  equivalent  problem: 

min      trf(n2An2^)E^(nfAni)S(n2An2^)  +  l)  (3.70) 

s,ni,n2        \  / 

N 

subject  to  tr  YJ  <^1  <  P- 

1=1 

Here  the  minimization  is  over  diagonal  matrices  S  with  cti  , aN  on  the  diagonal,  and  per- 
mutation matrices  Hi  and  112. 

The  symmetric  permutations  nf  AIIi  and  112  An.f  essentially  interchange  diagonal  ele- 
ments of  A  and  A.  Hence,  (3.70)  is  equivalent  to 

N 


, Y-. ^ (3.71) 


mm 

N 


bject  to     ^  af  <  P,      tti  eVm,      7r2  G  P„ 


su  ^ 

j=i 


where  Vm  is  the  set  of  bijections  of  {1, 2, ... ,  m}  onto  itself 

We  first  show  that  we  can  restrict  our  attention  to  the  largest  diagonal  elements  of  D  and 

Q. 

Lemma  3.7. L  Let  UAU^  and  VAV^  be  diagonalizations  ofQ  and  D  respectively  where 
the  columns  ofU  and  V  are  orthonormal  eigenvectors.  Let  a,  tti,  and  1^2  denote  an  optimal 
solution  0/(3. 71)  and  define  the  sets 

N={i:<y,>Q],      Q  =  {K,{^)■■^e^f}.     and    P  =  {(5,,(,)  :  *  G  A/"}, 

IfN  has  I  elements,  then  the  elements  of  the  set  V  and  Q  are  all  nonzero,  and  they  constitute 
the  I  largest  eigenvalues  ofD  and  Q  respectively. 

Using  Lemma  3.7.1,  we  now  eliminate  one  of  the  permutations  in  (3.71). 
Theorem  3.7.2.  Let  UAU^  and  Y AY"  be  diagonalizations  ofQ  andB  respectively  where 
the  columns  ofV  and  V  are  orthonormal  eigenvectors,  and  the  eigenvalues  ofQ  and  D  are 
arranged  in  decreasing  order  as  in  (3.67).  If  K  is  the  minimum  of  the  rank  ofD  and  Q,  then 


(3.71)  is  equivalent  to 
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^  1 

min  V = (3.72) 


subject  to      '^cr'}  <  P,      it  eVk, 


t=i 


where  Oi  =  ^for  i  >  K. 

Proof.  The  proof  is  similar  to  that  for  Theorem  3.3.2  □ 

Corollary  3.7.1.  Problem  (3.65)  has  a  solution  of  the  form  S  =  UnSV^  where  the  columns  of 
U  and  V  are  orthonormal  eigenvectors  ofQ  and  D  respectively  with  the  associated  eigenvalues 
arranged  in  decreasing  order,  11  is  a  permutation  matrix,  and  E  is  diagonal. 

Proof  The  proof  is  similar  to  that  for  Corollary  3.3.1.  D 

Assuming  the  permutation  vr  in  (3.72)  is  given,  let  us  now  consider  the  problem  of  optimiz- 
ing over  a.  To  simplify  the  indexing,  let  p,  denote  A^(,).  Hence,  for  fixed  tt,  (3.72)  is  equivalent 
to  the  following  optimization  problem: 

mm        V  v^^4 r  (3-73) 


1 

V 

K 

subject  to     yj  (j1  <  P. 


1=1 
The  solution  of  (3.73)  can  be  expressed  in  terms  of  a  Lagrange  multiplier  for  the  constraint. 
Theorem  3.7.3.  The  optimal  solution  of  {3.73)  is  given  by 

a.  =  maxj.O^--^,    o|      ,  (3.74) 

where  the  parameter  p  is  chosen  so  that 

f2^1  =  P  (3.75) 

i=l 

Proof  The  proof  is  similar  to  that  for  Theorem  3.3.3.  D 
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To  solve  (3.65),  we  need  to  find  an  optimal  ordering  for  the  eigenvalues  of  D  and  Q.  In 
Theorems  3.7.4  and  3.7.5,  we  determine  the  optimal  ordering  when  the  power  P  is  either  large 
or  small. 

Theorem  3.7.4.  If  the  eigenvalues  {A,}  and  {S^}  of  Q  and  D  respectively  are  arranged  in 
decreasing  order,  then  for  P  sufficiently  large,  an  optimal  permutation  in  (3.72)  is 

7r(v;)  =  A'  +  1  -  V.      1  <  /  <  K,      ^(/)  =  7,      I  >  K.  (3.76) 

Theorem  3.7.5.  Suppose  the  eigenvalues  {  A, }  and  {  S^ }  o/Q  and  D  respectively  are  arranged  in 
decreasing  order,  and  let  L  be  the  minimum  of  the  multiplicities  ofSi  and  Ai.  For  P  sufficiently 
small,  an  optimal  solution  of  {3.65)  is 

L 

L 

where  u,  andw,  are  the  orthonormalized  eigenvectors  o/Q  andT)  associated  with  Ai  and  Si 
respectively. 
3.7.2     A  Determinant  Problem 

In  this  appendix,  we  analyze  the  following  matrix  optimization  problem  where  we  maxi- 
mize the  determinant,  denoted  "det",  of  a  matrix: 

max   det  (DS^QSD  +  I)  (3.78) 

s 

subject  to  tr  (S^^S)  <  P,     S  6  C™^" 

Since  the  determinant  of  the  inverse  of  a  matrix  is  the  reciprocal  of  the  determinant  of  the  matrix, 
it  follows  that  problem  (3.78)  is  equivalent  to  replacing  trace  by  detenninant  in  (3.65).  Hence, 
in  the  original  problem  (3.65),  we  minimize  the  sum  of  the  eigenvalues  of  the  MSE  matrix  C, 
while  in  the  second  problem  (3.78),  we  minimize  the  product  of  the  eigenvalues  of  C.  In  either 
case,  we  try  to  make  the  eigenvalues  of  C  small,  but  with  different  metrics. 

For  the  special  case  D  =  I,  the  solution  of  (3.78)  can  be  found  in  Telatar  [1],  and  for  the 
special  case  Q  =  I,  the  solution  of  (3.78)  can  be  found  in  Zhou  [63].  For  the  more  general 
problem  (3.78),  we  again  show  that  the  solution  can  be  expressed  S  =  USV^,  where  U  and  V 
are  orthonormal  matrices  of  eigenvectors  for  Q  and  D  respectively,  and  S  is  diagonal.  Unlike 
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the  trace  problem  (3.65),  the  ordering  of  the  columns  of  U  and  V  does  not  depend  on  the 
power  P  -  the  columns  of  U  and  V  should  be  ordered  so  that  the  associated  eigenvalues  of 
Q  and  D  are  in  decreasing  order.  This  optimal  eigenvector  ordering  result  is  the  same  as  that 
for  the  optimization  problem  (3.18)  in  Section  3.3  when  the  same  notations  for  corresponding 
matrices  are  adopted.  In  Cai  et  al.  [65],  the  authors  formulated  the  similar  optimization  problem 
while  studying  the  space-time  spreading  (STS)  scheme  for  correlated  fading  channels  in  the 
presence  of  interference.  Based  on  the  previous  optimization  result  for  the  special  case  Q  =  I 
[63],  USV^  was  chosen  as  the  STS  matrix,  and  then  the  optimal  eigenvector  ordering  and  S 
were  decided.  Here  we  solve  the  optimization  problem  (3.78)  by  using  the  method  introduced 
in  Wong  et  al.  [61]  and  Wong  et  al.  [84].  (Two  important  matrix  inequalities  arismg  from 
majorization  theory  [40]  are  used.) 

The  determinant  problem  arises  from  spreading  sequence  optimization  for  CDMA  systems. 
For  CDMA  systems,  a  different  performance  measure,  which  arises  in  information  theory,  is 
the  sum  capacity  of  the  channel.  The  mean  square  error  is  a  performance  measure  for  uncoded 
systems,  while  the  sum  capacity  is  a  performance  measure  for  coded  systems.  It  represents  the 
maximum  sum  of  the  rates  at  which  users  can  transmit  information  reliably.  The  sum  capacity 
of  the  synchronous  multiple  access  channel  (3.66)  is 

Csum  =  max  /(xi, . . .  ,xk] y), 

where  /  represents  the  mutual  information  [74]  between  the  inputs  xi,  X2, . . . ,  x/f  and  the  out- 
put vector  y.   The  maximization  is  over  the  independent  random  inputs  xi,  X2, xk-   The 

maximum  is  achieved  when  all  the  random  inputs  are  Gaussian.  In  this  case,  the  sum  capacity 
[71,  85]  becomes 

Csum  =  2^  log  det  (H^S^E-^SH  +  I). 

Since  log  is  a  monotone  increasing  function,  the  maximization  of  the  sum  capacity,  subject  to  a 
power  constraint,  corresponds  to  the  optimization  problem  (3.78)  with  Q  =  E"^  and  D  =  H. 

The  solution  to  the  determinant  problem  (3.78)  can  be  expressed  as  follows: 
Theorem  3.7.6.  Let  UAU^  and  VAV^  be  the  diagonalizations  of  Q  and  D  respectively 
where  the  columns  o/U  and  V  are  orthonormal  eigenvectors  and  the  corresponding  eigenvalues 
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{Xi}  and  {6t}  are  arranged  in  decreasing  order  If  K  is  the  minimum  of  the  rank  ofQ  and  D, 
then  the  optimal  solution  of  (3.78)  is  given  by 

S  =  UEV'^,  (3.79) 

where  S  is  diagonal  with  diagonal  elements  given  by 

a,  =  max  <^  -  -  — ^,    0  ^         for  1  <  i<  K  (3.80) 

I  /"         A,;()/  J 

and  Oi  =  Ofor  i  >  K,  where  the  parameter  jjl  is  chosen  so  that 

K 

Proof  Initially,  let  us  assume  that  both  D  and  Q  are  nonsingular  -  later  we  remove  this  restric- 
tion. Insert  T  =  Q^/^S  in  (3.78)  and  multiply  the  objection  function  on  the  left  and  right  by 
det  (D~^)  to  obtain  the  following  equivalent  formulation: 

max   det(T^T  +  D-2)  (3  81) 

T 

subject  to  tr  (TT^Q"^)  <  F,     T  G  C"'^" 

Let  Wi,  1  <  i  <  n,  denote  the  eigenvalues  of  T^T  arranged  in  decreasing  order.  By  a  theorem  of 
Fiedler  [86]  (also  see  [40,  Chap.  9,  G.4]),  the  determinant  of  a  sum  T'^T  +  D-^  of  Hermitian 
matrices  is  bounded  by  the  product  of  the  sum  of  the  respective  eigenvalues  (assuming  the 
eigenvalues  of  T^T  and  D  are  in  decreasing  order): 

77, 

det  (T^T  +  D-2)  <  [](...,  +  bf)  (3.82) 

Also,  by  a  theorem  of  Ruhe  [87]  (also  see  [40,  Chap.  9,  H2]),  the  trace  of  a  product  (TT^)Q-i 
of  Hermitian  matrices  is  bounded  from  below  by  the  sum  of  the  product  of  respective  eigenval- 
ues (assuming  the  eigenvalues  of  TT^  and  Q  are  in  decreasing  order): 

N 

tr  (TT^Q-i)  >  J]tJ«A-^      TV  =  min{m,n},  (3.83) 

1=1 

since  at  most  A'^  eigenvalues  of  T^T  and  TT^  are  nonzero. 
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We  replace  the  cost  function  in  (3.78)  by  the  upper  bound  (3.82)  and  we  replace  the  con- 
straint in  (3.78)  by  the  lower  bound  (3.83)  to  obtain  the  problem: 

/       r,  \      N 


max    n  cMn^^'+'^r')  (^•^'^^ 

\i=N+l  J    i=l 

N 

subject  to  ^  uj.,\~^  <  P,     uJi>  Ui+i  >  0  for  i  <  N. 


i=l 


If  T  is  feasible  in  (3.81),  then  the  square  of  its  singular  values  are  feasible  in  (3.84)  by  (3.83). 
And  by  (3.82),  the  value  of  the  cost  ftinction  in  (3.84)  is  greater  than  or  equal  to  the  associated 
value  (3.81).  Since  the  feasible  set  for  (3.84)  is  closed  and  bounded,  and  since  the  cost  function 
is  continuous,  there  exists  a  maximizing  lu,  and  the  maximum  value  of  the  cost  function  (3.84) 
is  greater  than  or  equal  to  the  maximum  value  in  (3.81). 

Consider  the  matrix  T  =  Un^/^V^  where  fi  is  a  diagonal  matrix  containing  the  max- 
imizing LJ  on  the  diagonal.  For  this  choice  of  T,  the  inequalities  (3.82)  and  (3.83)  are  both 
equalities.  Hence,  this  choice  for  T  attains  the  maximum  in  (3.81).  The  corresponding  optimal 
solution  of  (3. 78)  is 

S  =  Q-^/-T  =  UA-^/'U^Uf2i/2y//  ^  UA-'/^n'/^V".  (3.85) 

To  complete  the  proof  of  the  theorem,  we  need  to  explain  how  to  compute  the  optimal  w  in 
(3.84). 

At  the  optimal  solution  of  (3.84),  the  power  constraint  must  be  an  equality  (otherwise,  we 
could  multiply  lu  by  a  positive  scalar  and  increase  the  cost).  Let  us  ignore  the  monotonicity 
constraint  a;,  >  ui+i  (we  will  show  that  the  maximizer  satisfies  this  constraint  automatically). 
After  taking  the  log  of  the  cost  function,  we  obtain  the  following  simplified  version  of  (3.84): 

N 

max    Y  \og{ij,  +  S-^)  (3.86) 

iV 

subject  to  '^uj,X~'^  =  P,     LU  >0. 
Since  the  cost  function  is  strictly  concave,  the  maximizer  of  (3.86)  is  unique. 
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The  first-order  optimality  conditions  (KKT  conditions)  for  an  optimal  solution  of  (3.86) 
are  the  following:  There  exists  a  scalar  /^  >  0  and  a  vector  i/  G  R"  such  that 

l_-  +  ii-^  =  0,     ;/,  >0,     cj,  >0,     UiUi  =  Q,     l<i<N.  (3.87) 

uji  +  6;^      A, 

Analogous  to  the  proof  of  Theorem  3.7.3,  we  define  the  function 

This  particular  value  for  Ui  is  obtained  by  setting  i/j  =  0  in  (3.87),  solving  for  u{,  when  the 
solution  is  <  0,  we  set  Ui{n)  =  0  (this  corresponds  to  the  +  operator  (3.88)).  Observe  that 
uJi(/i)  in  (3.88)  is  a  decreasing  function  of /i  which  approaches  +oo  as  /j.  approaches  0  and 
which  approaches  0  as  fu.  tends  to  +oo.  Hence,  the  equation 

E^'(^)V^-^  (3.89) 

1=1 

has  a  unique  positive  solution.  We  have  Wj  =  0  for  /x  >  X^Sf,  which  implies  that 


u,  = ,   /     ._,  +  f  =  -^  +  f  >-c?f  +  ^?  =  0     when      ^/,  >  A,<5; 


It  follows  that  the  KKT  conditions  are  satisfied  when  i.i  is  the  positive  solution  of  (3.89).  Since 
the  A,  and  5^  are  both  arranged  in  decreasing  order,  it  follows  that  for  any  choice  /x  >  0,  the  Wi 
given  by  (3.88)  are  in  decreasing  order.  Hence,  the  constraint  u,+i  <  w,  in  (3.84)  is  satisfied 
by  the  solution  of  (3.86).  Combining  the  formula  (3.88)  for  the  solution  of  (3.86)  with  the 
expression  (3.85)  for  the  solution  of  (3.78),  we  obtain  the  solution  S  given  in  (3.79)  and  (3.80) 
where  S  =  K-'I^Vl"''. 

Now  suppose  that  either  D  or  Q  is  singular.  Let  us  consider  a  perturbed  problem  where 
we  replace  Q  by  Q,  =  UA,U^  and  D  by  D,  =  VA.V^: 

max   det  (D,S^Q,SD,  +  I)  (3.90) 

subject  to  tr  (S^S)  <  P,      S  G  C"^" 

Here  A,  and  A,  are  obtained  from  A  and  A  by  setting  6,  =  t  =  X^  for  i  or  j  >  K.  Since  Q^ 
and  Dt  are  nonsingular,  it  follows  from  our  previous  analysis  that  the  perturbed  problem  (3.90) 
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has  a  solution  of  the  form  S,  =  \JT,,Y"  where  the  diagonal  elements  of  S,  are  given  by 

1       1       V^' 

max<!--— 2-    0!>         fori  <i<K, 


a'=  < 


Let  //  be  chosen  so  that 


.M      '^'^«        J  (3  91) 

max  < T ,    0  >  for  z  >  A'. 

u       e^        J 


K 


E(^n^  =  p- 


i=i 


Observe  that  when  e^  <  fi,  we  have  aj  =  0  for  i  >  K  and 


N 


Hence,  for  each  e  >  0  with  e^  <  ^/,  the  optimal  solution  of  the  perturbed  problem  does  not 
depend  on  e  and  the  trailing  diagonal  elements  a\  for  i  >  K  vanish.  Since  the  cost  function  in 
the  perturbed  problem  (3.90)  is  a  continuous  function  of  e,  we  conclude  that  for  e^  <  /,«,  S^  is  the 
optimal  solution  of  (3.90)  for  t  =  0.  The  perturbed  problem  (3.90)  with  e  =  0  coincides  with 
the  original  problem  (3.78).  Consequently,  the  solution  (3.79)  and  (3.80)  is  valid,  even  when 
either  Q  or  D  is  singular.  I-J 


CHAPTER  4 
CONCLUSION  AND  FUTURE  WORK 

To  achieve  the  performance  gain  promised  by  multiple  antenna  systems,  parameter  estima- 
tions including  timing  estimation  and  channel  estimation  are  key  components  of  the  space-time 
system  design.  In  this  work,  we  investigate  the  timing  estimation  and  channel  estimation  prob- 
lems for  MIMO  systems. 

4. 1     Timing  Estimation  for  Rayleigh  Flat-fading  MIMO  Channels 

In  Chapter  2,  we  consider  a  wireless  communication  system  with  muhiple  transmit  and 
receive  antennas  in  a  slow,  independent  and  identically  distributed  (i.i.d.)  Rayleigh  flat-fading 
environment.  We  study  the  problem  of  timing  estimation  in  such  a  system  with  the  aid  of 
training  signals  from  two  different  approaches.  In  the  first  approach,  the  channel  is  assumed  to 
be  unknown  but  detenninistic  and  joint  ML  estimation  of  the  channel  and  delay  is  performed.  In 
contrast,  in  the  second  approach,  we  assume  that  the  channel  is  random  but  with  known  statistics 
and  use  the  likelihood  function  averaged  over  all  random  channel  realizations  to  construct  the 
ML  estimator  for  the  delay.  For  both  approaches,  we  derive  the  optimal  training  sequences 
based  on  the  performance  measures  associated  with  the  CRB  of  timing  estimation.  These  two 
approaches  lead  to  two  different  optimal  training  signal  designs.  For  the  deterministic  channel 
approach,  we  show  that  orthogonal  training  signals  from  multiple  transmit  antennas  minimize 
the  outage  probability  as  well  as  the  average  CRB.  For  the  random  channel  approach,  perfectly 
correlated  training  signals  employed  at  different  transmit  antennas  minimize  the  CRB. 

4.2     Channel  Estimation  for  Correlated  MIMO  Channels  with  Colored  Interference 

In  Chapter  3,  we  consider  a  wireless  communication  system  with  multiple  transmit  and 
receive  antennas  in  a  slow,  Rayleigh  flat-fading  environment.  We  investigate  the  problem  of 
estimating  correlated  MIMO  channels  in  the  presence  of  colored  interference.  The  Bayesian 
channel  estimator  is  derived  and  the  optimal  training  sequences  are  designed  based  on  mini- 
mizing the  MSE  of  channel  estimation.   The  design  of  the  optimal  training  sequences  has  a 
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clear  physical  interpretation  which  implies  that  we  should  assign  more  power  to  the  transmis- 
sion direction  constructed  by  the  eigen-directions  with  larger  channel  gains  and  the  interference 
subspaces  with  less  interference.  The  power  assignment  is  determined  by  the  water-filling  argu- 
ment under  a  finite  power  constraint.  In  order  to  implement  the  channel  estimator  and  construct 
the  optimal  training  sequences,  we  propose  an  algorithm  to  estimate  long-term  channel  statis- 
tics and  design  an  efficient  feedback  scheme  so  that  we  can  approximately  construct  the  optimal 
sequences  at  the  transmitter.  Numerical  results  show  that  with  optimal  training  sequences,  the 
MSE  of  channel  estimation  can  be  reduced  substantially  when  compared  with  other  training 

sequences. 

4.3     Timing  Estimation  for  Correlated  MIMO  Channels  with  Colored  Noise 

In  the  second  chapter,  we  study  the  timing  estimation  problem  with  the  assumption  that 
the  fading  coefficients  between  the  pairs  of  transmit  and  receive  antennas  are  independent  and 
identically  distributed.  This  assumption  does  not  generally  hold  in  practice  due  to  the  antenna 
spacings  and  orientation,  the  mutual  coupling,  the  richness  of  scattering,  and  the  presence  of 
dominant  components  [88].  Thus  it  is  natural  to  extend  the  current  work  to  investigate  the 
synchronization  problem  in  correlated  channels. 

Another  possible  direction  to  extend  the  present  work  is  to  address  the  timing  estimation 
problem  for  the  MIMO  system  in  colored  noise.  It  is  more  suitable  to  adopt  the  colored  noise 
model  than  the  white  noise  model  when  jammers  and  co-channel  interference  are  present  in  the 
communication  system. 
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